readmbox
main()
initializes the mailbox and then interacts with the
user.
<main()
>= (U->)
int main(int argc, char * argv[]) {
if (argc != 2) {
cerr << "Command format is \"" << argv[0] << " mailbox\".\n";
return EXIT_FAILURE;
}
mailbox mbox;
const string resp = mbox.initialize(argv[1]);
if (resp != "") {
cerr << resp << ".\n";
return EXIT_FAILURE;
}
do_io(mbox);
return EXIT_SUCCESS;
}
Definesmain
(links are to index).
Prompt the user for input and parse the user's input into words. Return true if the user entered anything; return false on end of input.
<user interaction>= (U->) [D->] typedef deque<string> command; static bool get_command(command & cmd) { do { cout << "* "; string resp; getline(cin, resp); if (cin.eof()) return false; cmd.erase(cmd.begin(), cmd.end()); parse_cmd(resp, cmd); } while (cmd.size() == 0); return true; }
Definescommand
,get_command
(links are to index).
Check an order command and, if syntactically correct, do the right thing; otherwise, print an error message. A new order command adds a new branch to the check.
<user interaction>+= (U->) [<-D->] static void do_order(const command & cmd, mailbox & mbox) { if (cmd.size() == 0) mbox.restore_original_order(); else if (cmd.size() == 1) mbox.order(cmd[0]); else if (cmd.size() == 2) mbox.order(cmd[0], cmd[1]); else cout << "Garbled command format for order. Proper command format " << "is \"order field word\", \"order word\" or \"order\".\n"; }
Definesdo_order
(links are to index).
do_delete()
is just a thinly veiled version of do_order()
.
<user interaction>+= (U->) [<-D->] static void do_delete(const command & cmd, mailbox & mbox) { if (cmd.size() == 0) mbox.restore_deleted_msgs(); else if (cmd.size() == 2) mbox.delete_msgs(cmd[0], cmd[1]); else cout << "Garbled command format for delete. Proper command format " << "is either \"delete field word\" or \"delete\".\n"; }
Definesdo_delete
(links are to index).
Print the contents of the requested message resp
if the request is for an
existing message; otherwise, print an error message.
<user interaction>+= (U->) [<-D->] static void do_print(const string & resp, const mailbox & mbox) { assert(is_number(resp)); const int mno = atoi(resp.c_str()); const unsigned mcnt = mbox.msgcnt(); if ((mno < 1) || (mno > mcnt)) { cout << "Message number " << mno << " is out of range; the "; if (mcnt == 0) cout << "message list is empty"; else cout << "valid range is [1, " << mcnt << "]"; cout << ".\n"; } else mbox.output(cout, mno); }
Definesdo_print
(links are to index).
User interaction reads a string from the user, parses it, and executes a command based on the first word input from the user.
<user interaction>+= (U->) [<-D] static void do_io(mailbox & mbox) { const int mcnt = mbox.msgcnt(); if (mcnt == 1) cout << "There is 1 message.\n"; else cout << "There are " << mcnt << " messages.\n"; if (mcnt > 0) { command cmd; while (get_command(cmd)) { command::value_type resp = cmd[0]; cmd.pop_front(); if (is_number(resp)) do_print(resp, mbox); else if ((resp[0] == 'q') || (resp[0] == 'Q')) break; else if (resp == "order") do_order(cmd, mbox); else if (resp == "delete") do_delete(cmd, mbox); else { cout << "\""<< resp << "\" is an unknown command. "; cout << "Known commands are\n"; cout << " n - print message n.\n"; cout << " quit - end the program.\n"; cout << " order - re-order the message list.\n"; cout << " delete - delete messages from the message list.\n"; } } } }
Definesdo_io
(links are to index).
Command parsing accepts a string input from the user and returns a dequeue of the words in the user's input string. In this case a word is a sequence of non-white-space characters.
Parsing itself skips over white space, which finds the start of a word, and then skips to white space, which finds the end of the word.
<parse user command>= (U->) <white-space predicates> static void parse_cmd(const string & str, deque<string> & cmd) { cmd.erase(cmd.begin(), cmd.end()); string::const_iterator start = str.begin(); const string::const_iterator end = str.end(); while (start != end) { start = find_if(start, end, is_nonwhite_space); if (start == end) break; const string::const_iterator e = find_if(start, end, is_white_space); cmd.push_back(string(start, e)); start = e; } }
Definesparse_cmd
(links are to index).
The white-space and non-white-space predicates for find_if()
.
<white-space predicates>= (<-U) static bool is_white_space(char ch) { return ((ch == ' ') || (ch == '\t')); } static bool is_nonwhite_space(char ch) { return !is_white_space(ch); }
Definesis_nonwhite_space
,is_white_space
(links are to index).
A message number is just a string of digits, and is_number()
checks to make
sure. The is_not_digit()
circumlocution is not necessary because the STL
provides the unary predicate adaptor not1()
, but we haven't got to function
adopters yet.
<number test>= (U->) static bool is_not_digit(char c) { return !isdigit(c); } static bool is_number(const string & str) { return find_if(str.begin(), str.end(), is_not_digit) == str.end(); }
Definesis_not_digit
,is_number
(links are to index).
That's it for the readmbox
program itself.
<readmbox.cc
>= #include <iostream.h> #include <string> #include <deque> #include <algorithm> #include <ctype.h> #include <stdlib.h> #include <assert.h> #include "mailbox-string.h" <number test> <parse user command> <user interaction> <main()
>
The Mailbox Class Interface
The mailbox interface grows by one method: the method needed to implement the
new form of the order command. The rest of the public interface stays the
same.
<public mailbox methods>= (U->) string initialize(const char * mbox_name); unsigned msgcnt(void) const; void output(ostream & out, unsigned mno) const; void order(const string & field, const string & word); void order(const string & word); void restore_original_order(void); void delete_msgs(const string & field, const string & word); void restore_deleted_msgs(void);
The private parts of the mailbox object begin to reveal how the mailbox is
represented within readmbox
.
The mailbox text is stored internally as a vector of strings.
<private mailbox elements>= (U->) [D->] typedef vector<string> Mailbox; typedef Mailbox::iterator Mailbox_itr; typedef Mailbox::const_iterator Mailbox_citr; Mailbox mbox;
DefinesMailbox
,Mailbox_citr
,Mailbox_itr
(links are to index).
Folded header fields mean that a header field may be spread out over more than
one line in mbox
. This suggest representing each unfolded header field as
an iterator range (s, e)
where s
points to the first mbox
line in
the header field and e
points to the first mbox
line after the header
field.
The new form of the order command requires knowing, for each message, which words appear in the text part of the header fields, and how many times each word appears. A multiset of words (strings) can easily provide this information.
<private mailbox elements>+= (U->) [<-D->] typedef pair<Mailbox_citr, Mailbox_citr> Header_field; typedef vector<Header_field> Header; typedef Header::iterator Header_itr; struct Message { Header header; Mailbox_citr content_start; // iterator to the first line of the content Mailbox_citr content_end; // iterator to the first line past the content unsigned original_order; unsigned matching_order; multiset<string> words; };
DefinesHeader
,Header_field
,Header_itr
,Message
(links are to index).
Msg_starts
is an auxiliary data structure helpful when parsing mbox
.
<private mailbox elements>+= (U->) [<-D->] typedef vector<Mailbox_itr> Msg_starts;
DefinesMsg_starts
(links are to index).
Representing the message list as a list instead of a vector makes it easier and faster to move groups of messages into and out of the message list (at the cost of eliminating direct access to individual messages).
<private mailbox elements>+= (U->) [<-D->] typedef list<Message> Message_list; typedef Message_list::iterator Msglst_itr; typedef Message_list::const_iterator Msglst_citr; Message_list msg_list, deleted_msgs;
DefinesMessage_list
,Msglst_citr
,Msglst_itr
(links are to index).
The private functions to set up the mailbox and message list.
<private mailbox elements>+= (U->) [<-D->] bool read(const char * mbox_name); string find_message_starts(Msg_starts &); Msglst_itr partition_matches(const string & field, const string & word);
These funky private functions are needed to get around some scope C++ restrictions and to serve as functional parameters for STL generic algorithms.
<private mailbox elements>+= (U->) [<-D->] static bool original_msg_order(const Message &, const Message &); static bool msg_greaterthan(const Message &, const Message &); static bool reorder_msg(const Message &); static void search_header(Message &, const string &, const string &); static bool search_header_field( const Header_field &, const string &, const string &);
<private mailbox elements>+= (U->) [<-D] string parse_mailbox(void); string parse_message(Mailbox_itr &, Mailbox_itr &, Message &); string parse_header(Mailbox_citr, Mailbox_citr, Header &); void count_header_words(Message &); void listify_words(vector<string>&, const Mailbox_citr&, const Mailbox_citr&);
And that's it for the mailbox class interface.
<mailbox-string.h
>=
#ifndef _mailbox_char_h_defined_
#define _mailbox_char_h_defined_
#include <list>
#include <vector>
#include <string>
#include <set>
using namespace std;
class mailbox {
public:
<public mailbox methods>
private:
<private mailbox elements>
};
#endif
Definesmailbox
(links are to index).
The Mailbox Class Implementation
Now all that's left is to implement the mailbox class functions.
<mailbox-string.cc
>= [D->]
#include <algorithm>
#include <fstream.h>
#include <assert.h>
#include <strings.h>
#include <iterator>
#include "mailbox-string.h"
The public member function initialize()
reads in the external mailbox file,
and then find the message starts and header ends. If anything goes wrong,
return a string describing the problem, otherwise return the empty string.
<mailbox-string.cc
>+= [<-D->]
string mailbox::initialize(const char * mbox_name) {
if (!read(mbox_name))
return string("Can't open ") + mbox_name;
return parse_mailbox();
}
Definesinitialize
(links are to index).
The msgcnt()
public member function returns the number of messages in the
message list.
<mailbox-string.cc
>+= [<-D->]
unsigned mailbox::msgcnt(void) const {
return msg_list.size();
}
Definesmsgcnt
(links are to index).
Print the content for message mno
; don't forget to move mno
one to the
left so it's a sensible vector index.
<mailbox-string.cc
>+= [<-D->]
void mailbox::output(ostream & out, unsigned mno) const {
assert((1 <= mno) && (mno <= msgcnt()));
Msglst_citr mp = msg_list.begin();
while (mno-- > 1) mp++;
copy(
mp->content_start, mp->content_end, ostream_iterator<string>(out, "\n"));
}
Definesoutput
(links are to index).
Read into the mailbox mbox
the contents of the file named mbox_name
.
Return true if everything went ok, otherwise return false. mbox
ends with
a dummy message to simplify some of the code later on.
<mailbox-string.cc
>+= [<-D->]
bool mailbox::read(const char * mbox_name) {
ifstream mbox_file(mbox_name);
if (!mbox_file) return false;
string s;
while (getline(mbox_file, s))
mbox.push_back(s);
mbox.push_back("");
mbox.push_back("From ");
mbox.push_back("");
return true;
}
Definesread
(links are to index).
Parse the mailbox mbox
to find the messages. First, locate the starts of
all the messages (including the dummy message added by read()
. Second,
parse each message and store the message information in the message list.
parse_mailbox()
returns the empty string if everything went ok, otherwise,
it returns a string describing the problem.
<mailbox-string.cc
>+= [<-D->]
string mailbox::parse_mailbox(void) {
Msg_starts starts;
string resp = find_message_starts(starts);
if (resp != "") return resp;
const Msg_starts::const_iterator end = starts.end() - 1;
unsigned msgcnt = 0;
for (Msg_starts::iterator i = starts.begin(); i != end; i++) {
Message msg;
resp = parse_message(*i, *(i + 1), msg);
if (resp != "") return resp;
msg.original_order = msgcnt++;
msg_list.push_back(msg);
}
return "";
}
Definesparse_mailbox
(links are to index).
find_message_starts()
stores into starts
iterators pointing to the
first line of each message in mbox
. The first line of a message is the
line immediately after the line containing the start-of-message marker "From
". According to RFC 822, every message should contain at least one
text-line.
<mailbox-string.cc
>+= [<-D->]
static bool is_from(const string str) {
return (str.find("From ") == 0);
}
string mailbox::find_message_starts(Msg_starts & starts) {
Mailbox_itr start = mbox.begin();
const Mailbox_itr end = mbox.end();
while (true) {
start = find_if(start, end, is_from);
if (start == end) break;
starts.push_back(++start);
if (start == end)
return "Corrupted mailbox: the start-of-message marker is "
"not followed by a header line";
}
return "";
}
Definesfind_message_starts
,is_from
(links are to index).
Parse the message found in the mailbox iterator range (s, e)
, and store the
parsed message information in msg
.
The second iterator i in the iterator range indicating the last header field points to the blank line separating the header from the content; the content starts on the line following i.
The mailbox iterator e
points to the start of the next message, which means
the end of msg
is two lines (the blank line and the "From
") before the
line pointed to by e
. As a sanity check, the message header must contain
at least one field (RFC 822 requires that it contain at least three fields, but
one's enough here).
<mailbox-string.cc
>+= [<-D->]
string mailbox::parse_message(
Mailbox_itr & s, Mailbox_itr & e, Message & msg) {
string resp = parse_header(s, e, msg.header);
if (resp != "") return resp;
count_header_words(msg);
const unsigned hdrcnt = msg.header.size();
assert(hdrcnt > 0);
msg.content_start = msg.header[hdrcnt - 1].second + 1;
msg.content_end = e - 2;
return "";
}
Definesparse_message
(links are to index).
Given the message delimited by the iterator range (s, e)
, find the header
and store it in hedr
. Return the empty string if everything went well;
otherwise return a string describing the problem.
A line is unfolded if it doesn't start with a space or a tab.
<mailbox-string.cc
>+= [<-D->]
static bool is_empty(const string & str) {
return (str.length() == 0);
}
static bool is_unfolded(const string & str) {
return is_empty(str) || !index(" \t", str[0]);
}
string mailbox::parse_header(Mailbox_citr s, Mailbox_citr e, Header & hedr) {
if (!is_unfolded(*s))
return "Corrupted mailbox: message with first header line folded";
Mailbox_citr hdr_end = find_if(s, e, is_empty);
if (hdr_end == e)
return "Corrupted mailbox: message with no blank line after header";
if (hdr_end == s)
return "Corrupted mailbox: message with no header";
do {
e = find_if(s + 1, hdr_end, is_unfolded);
hedr.push_back(Header_field(s, e));
s = e;
}
while (s != hdr_end);
return "";
}
Definesis_empty
,is_unfolded
,parse_header
(links are to index).
Count the number of words appearing in each header-field text line; this is done a header field at a time. The first word in an header field is the label, which should not be included in the count.
<mailbox-string.cc
>+= [<-D->]
void mailbox::count_header_words(Message & m) {
for (Header_itr i = m.header.begin(); i != m.header.end(); i++) {
vector<string> wds;
listify_words(wds, i->first, i->second);
m.words.insert(wds.begin() + 1, wds.end());
}
}
Definescount_header_words
(links are to index).
Break up the header field (b, e)
into a sequence of words. A word is a
maximal contiguous sequence of alpha-numeric characters.
<mailbox-string.cc
>+= [<-D->]
static bool is_word_char(char c) {
return isalnum(c);
}
static bool is_nonword_char(char c) {
return !is_word_char(c);
}
void mailbox::listify_words(
vector<string> & words, const Mailbox_citr & b, const Mailbox_citr & e) {
for (Mailbox_citr i = b; i != e; i++) {
const string::const_iterator lend = i->end();
string::const_iterator wstart = find_if(i->begin(), lend, is_word_char);
while (wstart != lend) {
const string::const_iterator wend =
find_if(wstart, lend, is_nonword_char);
words.push_back(string(wstart, wend));
wstart = find_if(wend, lend, is_word_char);
}
}
}
Definescount_header_field_words
(links are to index).
Ordering the messages is done in two passes over the message list. The first pass marks the messages that should be moved to the head of the message list; the second pass does the moving.
The idea is to move all matching messages before all non-matching message,
which can be done by the partition()
generic algorithm. The requirement
that relative order be maintained can be handled by the stable_partition()
generic algorithm.
Both order()
and delete()
need to partition the message list using the
same criterion. delete()
could call order()
(or vice versa) to do the
partitioning, but delete()
needs to know the location of the partition,
and order()
can't return that information. partition_matches()
solves
this problem by doing the partition and returning the partition point;
order()
can ignore the return value and delete()
can use it.
<mailbox-string.cc
>+= [<-D->]
bool mailbox::reorder_msg(const Message & m) {
return m.matching_order == 0;
}
mailbox::Msglst_itr mailbox::partition_matches(
const string & field, const string & word) {
for (Msglst_itr i = msg_list.begin(); i != msg_list.end(); i++)
search_header(*i, field, word);
return stable_partition(msg_list.begin(), msg_list.end(), reorder_msg);
}
Definespartition_matches
,reorder_msg
(links are to index).
Order the message list so that all messages having word
in a header field
labeled field
appear before any message not having word
in a header
field labeled field
.
<mailbox-string.cc
>+= [<-D->]
void mailbox::order(const string & field, const string & word) {
partition_matches(field, word);
}
Definesorder
(links are to index).
Order the message list so that a message having more occurrences of word
in
its header appears before any message having fewer occurrences of word
in
its header.
<mailbox-string.cc
>+= [<-D->]
bool mailbox::msg_greaterthan(const Message & m1, const Message & m2) {
return m1.matching_order > m2.matching_order;
}
void mailbox::order(const string & word) {
for (Msglst_itr i = msg_list.begin(); i != msg_list.end(); i++)
i->matching_order = i->words.count(word);
msg_list.sort(msg_greaterthan);
}
Definesorder
(links are to index).
search_header_field()
returns true if the header field delimited by the
iterator range (s, e)
has the label label
and contains the word
word
. If the header field field
doesn't have the label label
or
doesn't contain the word word
, search_header_field()
returns false.
A little bit of care is needed to make sure the header-field label doesn't take
part in the search for word
; that's what start
does.
<mailbox-string.cc
>+= [<-D->]
bool mailbox::search_header_field(
const Header_field & hfield, const string & label, const string & word) {
Mailbox_citr s = hfield.first;
const Mailbox_citr e = hfield.second;
assert(s != e);
if (s->find(label) != 0) return false;
unsigned start = label.size();
if ((*s)[start] != ':') return false;
do {
if (s->find(word, start) != string::npos) return true;
start = 0;
}
while (++s != e);
return false;
}
Definessearch_header_field
(links are to index).
If message msg
's header contains a field labeled with label
and
containing the word word
in its text field, then search_header()
sets
msg.matching_order
to 0 to indicate that msg
should be re-ordered in
the message list. Otherwise msg.matching_order
is set to 1 to indicate
that msg
should be left alone in the message list.
<mailbox-string.cc
>+= [<-D->]
void mailbox::search_header(
Message & msg, const string & label, const string & word) {
assert(msg.header.size() > 0);
msg.matching_order = 1;
for (Header_itr i = msg.header.begin(); i != msg.header.end(); i++)
if (search_header_field(*i, label, word)) {
msg.matching_order = 0;
break;
}
}
Definessearch_header
(links are to index).
Sorting the message list on the original message order field in the messages
restores the original message order. The previous version of
restore_original_order()
used the sort()
generic algorithm, but because
sort()
can't work with lists, this version uses the sort()
member
function for lists.
<mailbox-string.cc
>+= [<-D->]
bool mailbox::original_msg_order(const Message & m1, const Message & m2) {
return m1.original_order < m2.original_order;
}
void mailbox::restore_original_order(void) {
msg_list.sort(original_msg_order);
}
Definesoriginal_msg_order
,restore_original_order
(links are to index).
To delete messages, move the messages to be deleted to the front of the message list (that is, order them) and then splice them into the deleted message list.
<mailbox-string.cc
>+= [<-D->]
void mailbox::delete_msgs(const string & field, const string & word) {
Msglst_itr i = partition_matches(field, word);
deleted_msgs.splice(deleted_msgs.end(), msg_list, msg_list.begin(), i);
}
Definesdelete_msgs
(links are to index).
To restore deleted messages, if any, splice them onto the end of the message list, which also empties the deleted-message list.
<mailbox-string.cc
>+= [<-D]
void mailbox::restore_deleted_msgs(void) {
msg_list.splice(msg_list.end(), deleted_msgs);
}
Definesrestore_deleted_msgs
(links are to index).
mailbox-string.cc
>: D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15, D16, D17, D18, D19
mailbox-string.h
>: D1
main()
>: D1, U2
readmbox.cc
>: D1