readmbox
program. This change can perhaps be easiest carried in two steps:
readmbox
main()
initializes the mailbox and then interacts with the
user.
<main()
>= (U->)
int main(int argc, char * argv[]) {
if (argc != 2) {
cerr << "Command format is \"" << argv[0] << " mailbox\".\n";
return EXIT_FAILURE;
}
mailbox mbox;
const string resp = mbox.initialize(argv[1]);
if (resp != "") {
cerr << resp << ".\n";
return EXIT_FAILURE;
}
do_io(mbox);
return EXIT_SUCCESS;
}
Definesmain
(links are to index).
User interaction reads a string from the user, parses it, and executes a
command based on the first word input from the user. do_io()
is getting a
little shaggy, and should be broken into smaller pieces; maybe next time.
<user interaction>= (U->) static void do_io(mailbox & mbox) { const int mcnt = mbox.msgcnt(); if (mcnt == 1) cout << "There is 1 message.\n"; else cout << "There are " << mcnt << " messages.\n"; if (mcnt > 0) while (true) { cout << "* "; string resp; getline(cin, resp); if (cin.eof()) break; deque<string>cmd; parse_cmd(resp, cmd); if (cmd.size() == 0) continue; resp = cmd[0]; cmd.pop_front(); if (is_number(resp)) { const int mno = atoi(resp.c_str()); if ((mno < 1) || (mno > mcnt)) cout << "Message number " << mno << " is out of range; the valid range is [1, " << mcnt << "].\n"; else mbox.output(cout, mno); } else if ((resp[0] == 'q') || (resp[0] == 'Q')) break; else if (resp == "order") { if (cmd.size() == 0) mbox.restore_original_order(); else if (cmd.size() == 2) mbox.order(cmd[0], cmd[1]); else cout << "Garbled command format for order. Proper command format " << "is either \"order field word\" or \"order\".\n"; } else { cout << "\""<< resp << "\" is an unknown command. "; cout << "Known commands are\n"; cout << " n - print message n.\n"; cout << " quit - end the program.\n"; cout << " order - re-order the message list.\n"; continue; } } }
Definesdo_io
(links are to index).
Command parsing accepts a string input from the user and returns a dequeue of the words in the user's input string. In this case a word is a sequence of non-white-space characters.
Parsing itself skips over white space, which finds the start of a word, and then skips to white space, which finds the end of the word.
<parse user command>= (U->) <white-space predicates> static void parse_cmd(const string & str, deque<string> & cmd) { cmd.erase(cmd.begin(), cmd.end()); string::const_iterator start = str.begin(); const string::const_iterator end = str.end(); while (start != end) { start = find_if(start, end, is_nonwhite_space); if (start == end) break; string::const_iterator const e = find_if(start, end, is_white_space); cmd.push_back(string(start, e)); start = e; } }
Definesparse_cmd
(links are to index).
The white-space and non-white-space predicates for find_if()
.
<white-space predicates>= (<-U) static bool is_white_space(char ch) { return ((ch == ' ') || (ch == '\t')); } static bool is_nonwhite_space(char ch) { return !is_white_space(ch); }
Definesis_nonwhite_space
,is_white_space
(links are to index).
A message number is just a string of digits, and is_number()
checks to make
sure. The is_not_digit()
circumlocution is not necessary because the STL
provides the unary predicate adaptor not1()
, but we haven't got to function
adopters yet.
<number test>= (U->) static bool is_not_digit(char c) { return !isdigit(c); } static bool is_number(const string & str) { return find_if(str.begin(), str.end(), is_not_digit) == str.end(); }
Definesis_not_digit
,is_number
(links are to index).
That's it for the readmbox
program itself.
<readmbox.cc
>= #include <iostream> #include <string> #include <deque> #include <algorithm> #include <ctype.h> #include <stdlib.h> #include "mailbox-string.h" <number test> <parse user command> <user interaction> <main()
>
The Mailbox Class Interface
The mailbox interface keeps the three public methods it had in the previous
assignment.
<public mailbox methods>= (U->) [D->] string initialize(const char * mbox_name); unsigned msgcnt(void) const; void output(ostream & out, unsigned mno) const;
The order command merits its own method in the mailbox's public interface.
<public mailbox methods>+= (U->) [<-D->] void order(const string & field, const string & word);
However, the order command can perform two operations: it orders the messages
in the message list according to some word in a header field and it
re-establishes the original ordering again. The form of the order()
member function given above is suited for the first operation, but it isn't
clear how to indicate the second operation.
Some arbitrary interpretation of order()
's arguments could serve to
indicate the second operation. For example, if either or both of the arguments
is the empty string, then order()
re-establishes the original order.
However, this approach is not very clear, requiring arbitrary interpretation of
argument values, and doesn't distinguish well the two different operations
performed by order()
.
Adding a second member function, one more suited to the order's second
operation, is another possibility. For example, overloading order()
with
a member function having no arguments
void order(void);
is one way to distinguish between the two order operations. The order()
with arguments would re-order the message list while the order()
without
arguments would re-establish the original message-list order.
Overloading the member function takes care of the objection about arbitrarily interpreting argument values, but it doesn't do much to distinguish between the two operations performed by order (to see why not, re-read the last sentence of the last paragraph). Fortunately, that's an easy objection to take care of: skip overloading and give the function a meaningful name.
<public mailbox methods>+= (U->) [<-D] void restore_original_order(void);
Programming is just another form of communication, and, as with all forms of communication, the clearer you can be, the better your ability to communicate.
The private parts of the mailbox object begin to reveal how the mailbox is
represented within readmbox
.
To keep things interesting, change the mailbox representation from a vector of characters to a vector of strings. The order command doesn't favor one form of mailbox over the other; this is just a way to illustrate an alternate representation for mailboxes.
<private mailbox elements>= (U->) [D->] typedef vector<string> Mailbox; typedef Mailbox::iterator Mailbox_itr; Mailbox mbox;
DefinesMailbox
,Mailbox_itr
(links are to index).
The changes to the mailbox, if any, that are required by the order and re-order commands will depend on how these commands are implemented. The exact implementation details are worked out below; for now it's enough to outline the implementations to make sure all the necessary data for the commands is available.
A simple way to implement re-ordering is to tag each message with its original order in the mailbox. Then re-ordering can be implemented by sorting on the original message order tag. The data needed by re-ordering is an integer indicating the message's order position in the original mailbox.
At first thought, it seems that ordering can be implemented in the same way as re-ordering: assign each message an integer tag based on whether or not the search pattern matches the message, and then sort on the tags. By properly assigning the tag values, the message list will be sorted as required by the order command.
A little more thought, however, shows that an integer tag isn't necessary. The order command moves the matching messages before the unmatching messages in the message list. This means the order command only needs to distinguish between messages that match the search pattern and those that don't, which only requires a boolean value. Booleans are better than integers because they're simpler then integers - only two values to worry about - and it's easier to determine which boolean value to use - matching or not.
Although the details are still to be developed, it seems that the order and re-order commands can be implemented by associating a boolean value and an integer value to each message.
Before going on, it might be wise to rethink the way in which messages are
represented. In the previous example solution, messages were
represented by the two arrays message_starts
and header_ends
so that
message i
was represented by the values message_starts[i]
,
header_ends[i]
and message_starts[i+1]
. That scheme could be continued
here by using an array of integers and an array of booleans.
However, treating a message as a set of values from various vectors is not a clear representation. In addition, the interpretation of some of those values was not straightforward, and was contrary to the usual STL interpretations (the interpretation message starts, in particular).
Representational clarity can be regained by gathering all data associated with a message in one place using a structure for messages. Data interpretation can be cleaned up by using explicit data items to indicate various features of the message.
<private mailbox elements>+= (U->) [<-D->] struct Message { Mailbox_itr header_start; // iterator to the first line of the header Mailbox_itr header_end; // iterator to the first line past the header Mailbox_itr content_start; // iterator to the first line of the content Mailbox_itr content_end; // iterator to the first line past the content unsigned original_order; bool matching; }; typedef vector<Message> Message_list; typedef Message_list::iterator Msglst_itr; Message_list msg_list;
DefinesMessage
,Message_list
,Msglst_itr
(links are to index).
The private functions to set up the mailbox and message list.
<private mailbox elements>+= (U->) [<-D->] bool read(const char * mbox_name); void find_message_starts(void); bool find_header_ends(void);
These funky private functions are needed to get around some scope C++ restrictions and to serve as functional parameters for STL generic algorithms.
<private mailbox elements>+= (U->) [<-D] static bool original_msg_order(const Message &, const Message &); static bool reorder_msg(const Message &); static void search_header(Message &, const string &, const string &);
And that's it for the mailbox class interface.
<mailbox-string.h
>=
#ifndef _mailbox_char_h_defined_
#define _mailbox_char_h_defined_
#include <vector>
#include <string>
class mailbox {
public:
<public mailbox methods>
private:
<private mailbox elements>
};
#endif
The Mailbox Class Implementation
Now all that's left is to implement the mailbox class functions.
<mailbox-string.cc
>= [D->]
#include <algorithm>
#include <fstream>
#include <assert.h>
#include "mailbox-string.h"
The public member function initialize()
reads in the external mailbox file,
and then find the message starts and header ends. If anything goes wrong,
return a string describing the problem, otherwise return the empty string.
<mailbox-string.cc
>+= [<-D->]
string mailbox::initialize(const char * mbox_name) {
if (!read(mbox_name))
return string("Can't open ") + mbox_name;
find_message_starts();
if (!find_header_ends())
return string("Mailbox appears to be corrupted: message header not ") +
string("followed by a blank line");
return string("");
}
Definesinitialize
(links are to index).
The msgcnt()
public member function returns the number of messages in the
message list.
<mailbox-string.cc
>+= [<-D->]
unsigned mailbox::msgcnt(void) const {
return msg_list.size();
}
Definesmsgcnt
(links are to index).
Print the content for message mno
; don't forget to move mno
one to the
left so it's a sensible vector index.
This code can also be written using STL's output stream iterators, but we haven't studied them yet.
<mailbox-string.cc
>+= [<-D->]
void mailbox::output(ostream & out, unsigned mno) const {
assert((1 <= mno) && (mno <= msgcnt()));
const Message & m = msg_list[mno - 1];
for (Mailbox_itr i = m.content_start; i != m.content_end; i++)
out << *i << "\n";
}
Definesoutput
(links are to index).
Read in the mailbox as a vector of strings. mbox
will always contain an
extra message start; this simplifies later code by insuring that mbox
is
always non-empty and by eliminating a special case for the last message in the
mailbox.
This code can be streamlined by using the STL's input-stream iterators, but we haven't studied them yet.
<mailbox-string.cc
>+= [<-D->]
bool mailbox::read(const char * mbox_name) {
ifstream mbox_file(mbox_name);
if (!mbox_file)
return false;
string s;
while (getline(mbox_file, s))
mbox.push_back(s);
mbox.push_back("");
mbox.push_back("From ");
return true;
}
Definesread
(links are to index).
Finding the message starts is divided into two tasks:
mbox
into the temporary vector
starts
.
starts
to construct the messages, which are pushed
into msg_list
.
<mailbox-string.cc
>+= [<-D->]
static bool is_from(const string str) {
return (str.find("From ") == 0);
}
void mailbox::find_message_starts(void) {
Mailbox_itr start = mbox.begin();
const Mailbox_itr end = mbox.end();
vector<Mailbox_itr> starts;
while (true) {
start = find_if(start, end, is_from);
if (start == end) break;
starts.push_back(start);
start++;
}
for (unsigned i = 0; i < starts.size() - 1; i++) {
Message m;
m.header_start = starts[i] + 1;
m.content_end = starts[i + 1] - 1;
m.original_order = i;
msg_list.push_back(m);
}
}
Definesfind_message_starts
,is_from
(links are to index).
Finding the message-header ends involves searching each message for the left-most blank line. RFC 822 requires that such a blank line exist, even if the message has no content; failure to find the blank line indicates a trashed mailbox.
<mailbox-string.cc
>+= [<-D->]
static bool is_blank(const string str) {
return (str.length() == 0);
}
bool mailbox::find_header_ends(void) {
for (Msglst_itr i = msg_list.begin(); i != msg_list.end(); i++) {
Message & m = *i;
Mailbox_itr sep = find_if(m.header_start, m.content_end, is_blank);
if (sep == m.content_end)
return false;
m.header_end = sep;
m.content_start = sep + 1;
}
return true;
}
Definesfind_header_ends
,is_blank
(links are to index).
Ordering the messages is done in two passes over the message list. The first pass marks the messages that should be moved to the head of the message list; the second pass does the moving.
As explained above, the idea is to move all matching messages before all
non-matching message, which can be done by the partition()
generic
algorithm. The requirement that relative order be maintained can be handled by
the stable_partition()
generic algorithm.
<mailbox-string.cc
>+= [<-D->]
bool mailbox::reorder_msg(const Message & m) {
return m.matching;
}
void mailbox::order(const string & field, const string & word) {
for (Msglst_itr i = msg_list.begin(); i != msg_list.end(); i++)
search_header(*i, field, word);
stable_partition(msg_list.begin(), msg_list.end(), reorder_msg);
}
Definesorder
,reorder_msg
(links are to index).
search_header_field()
returns true if the header field field
has the
label label
and contains the word word
in its text-line. If the header
field field
doesn't have the label label
or doesn't contain the word
word
in its text-line, search_header_field()
returns false.
<mailbox-string.cc
>+= [<-D->]
static bool search_header_field(
const string & field, const string & label, const string & word) {
if (field.find(label) != 0) return false;
const unsigned int i = label.size();
if (field[i] != ':') return false;
return field.find(word, i + 1) != string::npos;
}
Definessearch_header_field
(links are to index).
If message msg
's header contains a field labeled with label
and
containing the word word
in its text field, then search_header()
sets
msg.matching
to true to indicate that msg
should be re-ordered in the
message list. Otherwise msg.matching
is set to false to indicate that
msg
should be left alone in the message list.
<mailbox-string.cc
>+= [<-D->]
void mailbox::search_header(
Message & msg, const string & label, const string & word) {
msg.matching = false;
for (Mailbox_itr i = msg.header_start; i != msg.header_end; i++)
if (search_header_field(*i, label, word)) {
msg.matching = true;
break;
}
}
Definessearch_header
(links are to index).
Because we took the time to remember the order of the messages in the mailbox, restoring the original message order is simplicity itself: just order the messages based on their initial positions in the mailbox.
<mailbox-string.cc
>+= [<-D]
bool mailbox::original_msg_order(const Message & m1, const Message & m2) {
return m1.original_order < m2.original_order;
}
void mailbox::restore_original_order(void) {
stable_sort(msg_list.begin(), msg_list.end(), original_msg_order);
}
Definesoriginal_msg_order
,restore_original_order
(links are to index).
mailbox-string.cc
>: D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11
mailbox-string.h
>: D1
main()
>: D1, U2
readmbox.cc
>: D1
This page last modified on 7 July 2000.