<public main procedures>= (U→) int main() { std::string emsg; const nary_tree t = input_tree(std::cin, emsg); if (not emsg.empty()) { std::cerr << "! " << emsg << ".\n"; exit(EXIT_FAILURE); } path_pattern pp; while (input_path_pattern(std::cin, pp)) output_matches(std::cout, match_paths(t, pp)); }
Definesmain
(links are to index).
Unlike previous assignments, in which input and output were simple and obvious, input for this assignment isn't quite so clear (reading path patterns and writing path matches remain simple and obvious). In particular, how should the node information read determine how a tree gets built?
The first node read, if any, is the root of the entire tree; suppose it was read with i indentation spaces. The next node read, if it exists, has to have an indentation of j > i; otherwise the second node would be a sibling or parent of the root, which is impossible. This makes the second node a child of the root.
The third node read, if it exists, has indentation k. If k > j, then the third node is the root of a child for the second node. If k = j, then it's a sibling to the second node, which makes it a child of the second node's parent (the root, in this case). If k < j, then the third node is the root of a child for some ancestor of the second node (which, in this case, be the root again as the only ancestor of the second node; in general they may be more than one ancestor). The only case the second node can deal with is the first one, when the third node is the root of a child (k > j). The other two cases have to be handed up to an ancestor of the second node for handling.
Those observations, and the recursive nature of the trees, leads to the
input_tree()
procedure: read a node n to serve as the root of a
subtree. Keep reading nodes and adding the subtrees as n's children as
long as the new nodes have an indentation larger than n's. Otherwise stop
reading nodes and return, leaving an ancestor to handle the next node.
<private input-tree procedures>= (U→) [D→] static nary_tree tree_input(std::istream & is, std::string & emsg) { node_data nd; const bool b = read_node_data(is, nd); assert(b); const unsigned indent = nd.indent; nary_tree t = nary_tree(nd.label); while (peek_node_data(is, nd) ∧ (nd.indent > indent)) t.add_child(tree_input(is, emsg)); return t; }
Definestree_input
(links are to index).
Note the use of peek_node_tree()
to look at the next node on input without
reading it from intput (at least conceptually).
The public tree-input procedure should check for nodes before building the tree and check for no more nodes after building the tree.
<public input-tree procedures>= (U→) nary_tree input_tree(std::istream & is, std::string & emsg) { node_data nd; if (not peek_node_data(is, nd)) return nary_tree(); const nary_tree t = tree_input(is, emsg); if (peek_node_data(is, nd)) emsg = "The root has a sibling or a parent"; return t; }
Definesinput_tree
(links are to index).
The only other major subproblem left unsolved is searching the tree for pattern matches. Again, understanding the problem leads most of the way to a solution. Given a path pattern p1, p2, …, pn, find all subpaths n1, n2, …, nn on any path from the root to any leaf such that pi = ni for 0 < i ≤ n.
<private path-matching procedures>= (U→) static void find_matches( const nary_tree & t, const path_pattern & pp, unsigned current, const std::string & partial, path_matches & matches) { // Search the given tree for all paths that match the given pattern starting // at the given index and using the given partial path match. Store any // complete path matches found in the given reference. const unsigned pattern_size = pp.size(); assert(current < pattern_size); const std::string & label = t.data(), & pattern = pp.get(current), ptl = partial + (partial.empty() ? "" : " ") + label; assert(not label.empty()); if ((label == pattern) ∨ ("*" == pattern)) if (++current == pattern_size) matches.add(ptl); else for (unsigned i = 0; i < t.child_cnt(); i++) find_matches(t.get_child(i), pp, current, ptl, matches); for (unsigned i = 0; i < t.child_cnt(); i++) find_matches(t.get_child(i), pp, 0, "", matches); }
Definesfind_matches
(links are to index).
The find_matches()
prototype is a little shaggy and should hidden behind a
more presentable public interface function.
<public path-matching procedures>= (U→) path_matches match_paths(const nary_tree & t, const path_pattern & pp) { path_matches matches; if (not t.is_empty()) find_matches(t, pp, 0, "", matches); return matches; }
Definesmatch_paths
(links are to index).
<private main procedures>= (U→) [D→] static bool input_path_pattern(std::istream & is, path_pattern & pp) { // Read from the given input stream the next path pattern and store it in the // given reference. Return true if everything went without error, false // otherwise. std::string s; if (string_utils::read_next_nonblank(is, s)) pp = slist_utils::tokenize(s); return is.good(); }
Definesinput_path_pattern
(links are to index).
<private main procedures>+= (U→) [←D] static void output_matches(std::ostream & os, const path_matches & matches) { // Write to the given output stream the given paths, one path per line. for (unsigned i = 0; i < matches.size(); i++) os << matches.get(i) << "\n"; }
Definesoutput_matches
(links are to index).
<main.cc
>=
#include <iostream>
#include "input-tree.h"
#include "match-paths.h"
#include "path-pattern.h"
#include "slist-utils.h"
#include "path-matches.h"
#include "string-utils.h"
<private main procedures>
<public main procedures>
Definesmain.cc
(links are to index).
Tree Input
The main tree-input procedures were explained above. The
only other interesting thing about tree input is the need to support peeking at
the next available node information without (conceptually) reading it.
Peeking without reading can be simulated with a one-slot buffer that holds the
most recently read node information.
<private input-tree data>= (U→) static node_data nd_buffer; static bool buffer_full = false, no_more_data = false;
Peeking then consists of reading node information into the buffer if necessary and returning the buffer's contents.
<private input-tree procedures>+= (U→) [←D→] static bool peek_node_data(std::istream & is, node_data & td) { // Read, but do not remove, from the given input stream tree data and store // it in the given reference; return true. If anything untoward happens, // store an error message in the given string reference and return false. if (not buffer_full) { if (not read_node_data(is, nd_buffer)) return false; buffer_full = true; } td = nd_buffer; return true; }
Definespeek_node_data
(links are to index).
Reading and returning node information requires being aware of the buffer.
<private input-tree procedures>+= (U→) [←D→] static bool read_node_data(std::istream & is, node_data & nd) { // If there's tree data available in the buffer, store it in the given // reference and return true, emptying the buffer. Otherwsie read from the // given input stream tree data and store it in the given reference; return // true. If anything untoward happens, store an error message in the given // string reference and return false. if (buffer_full) { nd = nd_buffer; buffer_full = false; return true; } else return rtd(is, nd); }
Definesread_node_data
(links are to index).
<input-tree.h
>=
#ifndef _input_tree_h_defined_
#define _input_tree_h_defined_
#include <istream>
#include <string>
#include "nary-tree.h"
// Read from the given input stream an n-ary tree; return the tree. If
// anything untoward happens, store a suitable error message in the given
// string reference; otherwise the string reference is untouched.
extern nary_tree input_tree(std::istream &, std::string &);
#endif
Definesinput-tree.h
(links are to index).
<private input-tree procedures>+= (U→) [←D] static bool rtd(std::istream & is, node_data & td) { // Read from the given input stream tree data and store it in the given // reference; return true. If anything untoward happens, store an error // message in the given string reference and return false. std::string str; if (no_more_data or not std::getline(is, str)) return false; if (string_utils::is_blank_line(str)) { no_more_data = true; return false; } const unsigned start = str.find_first_not_of(" "), end = str.find_last_not_of(" \t"); assert(start ≠ std::string::npos); assert(end ≠ std::string::npos); td = node_data(start, str.substr(start, (end + 1) - start)); return true; }
Definesrtd
(links are to index).
<input-tree.cc
>=
#include <cassert>
#include <iostream>
#include "input-tree.h"
#include "string-utils.h"
struct node_data {
unsigned indent;
std::string label;
node_data() : indent(0), label("") { }
node_data(unsigned i, const std::string & s) : indent(i), label(s) { }
};
<private input-tree data>
// Deja vu c++ style.
static bool read_node_data(std::istream &, node_data &);
static bool peek_node_data(std::istream &, node_data &);
static bool rtd(std::istream &, node_data &);
static nary_tree tree_input(std::istream &, std::string &);
<public input-tree procedures>
<private input-tree procedures>
#ifdef INPUTTREETESTING
// g++ -g -o inputtreetesting -DINPUTTREETESTING -ansi -pedantic -Wall input-tree.cc nary-tree.cc && ./inputtreetesting
#include <sstream>
int main() {
std::istringstream iss("hello");
node_data td;
std::string emsg;
assert(rtd(iss, td));
assert(0 == td.indent);
assert("hello" == td.data);
iss.str(" hello");
iss.clear();
assert(rtd(iss, td));
assert(2 == td.indent);
assert("hello" == td.data);
iss.str(" green");
iss.clear();
assert(peek_node_data(iss, td));
assert(4 == td.indent);
assert("green" == td.data);
assert(read_node_data(iss, td));
assert(4 == td.indent);
assert("green" == td.data);
assert(not read_node_data(iss, td));
iss.str("red\n white\n blue\n black\n green\n");
iss.clear();
nary_tree t = input_tree(iss, emsg);
assert(emsg.empty());
}
#endif
<match-paths public interface>= (U→) // Search the given tree for all paths that match the given pattern; return the // matches. extern path_matches match_paths(const nary_tree &, const path_pattern &);
<match-paths.h
>=
#ifndef _match_paths_h_defined_
#define _match_paths_h_defined_
#include "nary-tree.h"
#include "path-pattern.h"
#include "string-list.h"
#include "path-matches.h"
<match-paths public interface>
#endif
Definesmatch-paths.h
(links are to index).
<match-paths.cc
>=
#include <cassert>
#include <iostream>
#include "match-paths.h"
<private path-matching procedures>
<public path-matching procedures>
#ifdef MATCHPATHSTESTING
// g++ -g -o matchpathstesting -DMATCHPATHSTESTING -ansi -pedantic -Wall match-paths.cc nary-tree.cc slist-utils.cc && ./matchpathstesting
#include "slist-utils.h"
#include <sstream>
int main() {
std::string word = "hello";
nary_tree t1(word);
path_matches matches = match_paths(t1, slist_utils::tokenize(word));
assert(matches.size() == 1);
assert(matches.get(0) == word);
word = "good-bye";
matches = match_paths(t1, slist_utils::tokenize(word));
assert(matches.size() == 0);
nary_tree t2(word);
t1.add_child(t2);
matches = match_paths(t1, slist_utils::tokenize(word));
assert(matches.size() == 1);
assert(matches.get(0) == word);
word = "good-bye hello";
matches = match_paths(t1, slist_utils::tokenize(word));
assert(matches.size() == 0);
word = "hello good-bye";
matches = match_paths(t1, slist_utils::tokenize(word));
assert(matches.size() == 1);
assert(matches.get(0) == word);
matches = match_paths(t1, slist_utils::tokenize("* good-bye"));
assert(matches.size() == 1);
assert(matches.get(0) == word);
matches = match_paths(t1, slist_utils::tokenize("hello *"));
assert(matches.size() == 1);
assert(matches.get(0) == word);
matches = match_paths(t1, slist_utils::tokenize("* *"));
assert(matches.size() == 1);
assert(matches.get(0) == word);
matches = match_paths(t1, slist_utils::tokenize("hello good-bye hello"));
assert(matches.size() == 0);
nary_tree t3("good-bye");
t3.add_child(t1);
word = "good-bye hello good-bye";
matches = match_paths(t3, slist_utils::tokenize(word));
assert(matches.size() == 1);
assert(matches.get(0) == word);
matches = match_paths(t3, slist_utils::tokenize("* hello good-bye"));
assert(matches.size() == 1);
assert(matches.get(0) == word);
t1 = nary_tree("red");
t1.add_child(nary_tree("white"));
t1.add_child(nary_tree("blue"));
t1.add_child(nary_tree("green"));
matches = match_paths(t1, slist_utils::tokenize("red ** green"));
assert(matches.size() == 1);
assert(matches.get(0) == "red white blue green");
}
#endif
Definesmatch-paths.cc
(links are to index).
<nary-tree public interface>= (U→) // Add the given tree as a child of this tree. void add_child(const nary_tree &); // Return the number of children. unsigned child_cnt() const; // Return the data associated with the root of this nary tree. const std::string & data() const; // Return the child at the given index. nary_tree get_child(unsigned) const; // Return true iff this tree is empty. bool is_empty() const; // Create a tree with no children and holding no data. nary_tree(); // Create a tree with no children and holding the given data. nary_tree(const std::string &); // Get rid of this nary tree. ~nary_tree(); // Create a new n-ary tree equal to the given tree. nary_tree(const nary_tree &); // Set this nary tree to be equal to the given nary tree. nary_tree & operator = (const nary_tree &);
<nary-tree.h
>=
#ifndef _nary_tree_h_defined_
#define _nary_tree_h_defined_
#include <string>
#include "sequence.h"
class nary_tree {
public:
<nary-tree public interface>
private:
class tree_node {
public:
std::string data;
sequence<tree_node *> children;
tree_node(const std::string & d) : data(d) { }
~tree_node() {
for (unsigned i = 0; i < children.size(); i++)
delete children.get(i);
}
private:
tree_node(const tree_node &) { }
tree_node & operator = (const tree_node &) { return *this; }
};
tree_node * root;
tree_node * copy_tree(const tree_node * const);
nary_tree(const tree_node * const);
};
#endif
Definesnary-tree.h
,nary_tree
(links are to index).
<nary-tree.cc
>=
#include "nary-tree.h"
void nary_tree::
add_child(const nary_tree & child) {
assert(root);
root→children.append(copy_tree(child.root));
}
unsigned nary_tree::
child_cnt() const {
return root ? root→children.size() : 0;
}
const std::string & nary_tree::
data() const {
assert(root);
return root→data;
}
nary_tree::tree_node * nary_tree::
copy_tree(const tree_node * const root) {
// Return a copy of the tree with the given root.
if (root) {
tree_node * const r = new tree_node(root→data);
for (unsigned i = 0; i < root→children.size(); i++)
r→children.append(copy_tree(root→children.get(i)));
return r;
}
return 0;
}
nary_tree nary_tree::
get_child(unsigned i) const {
assert(root);
return nary_tree(root→children.get(i));
}
bool nary_tree::
is_empty() const {
return not root;
}
nary_tree::
nary_tree(const std::string & d) :
root(new tree_node(d)) {
}
nary_tree::
nary_tree() : root(0) {
}
nary_tree::
nary_tree(const tree_node * const r) : root(copy_tree(r)) {
}
nary_tree::
nary_tree(const nary_tree & r) : root(copy_tree(r.root)) {
}
nary_tree::
~nary_tree() {
delete root;
root = 0;
}
nary_tree & nary_tree::
operator = (const nary_tree & rhs) {
if (this ≠ &rhs) {
delete root;
root = copy_tree(rhs.root);
}
return *this;
}
#ifdef NARYTREETESTING
// g++ -g -o narytreetesting -DNARYTREETESTING -ansi -pedantic -Wall nary-tree.cc && ./narytreetesting
int main() {
nary_tree t("hello");
t.add_child(nary_tree("green"));
t.add_child(nary_tree("red"));
assert(2 == t.child_cnt());
assert("hello" == t.data());
assert("green" == t.get_child(0).data());
assert("red" == t.get_child(1).data());
}
#endif
Definesnary-tree.cc
(links are to index).
Path Patterns and Matches
A path pattern is nothing more than a string list in which each string is a
label pattern.
<path-pattern.h
>=
#ifndef _path_pattern_h_defined_
#define _path_pattern_h_defined_
#include "string-list.h"
typedef string_list path_pattern;
#endif
Definespath-pattern.h
,path_pattern
(links are to index).
Path matches are stored as strings in a set, which is just a string list that doesn't add strings that are already in the list.
<path-matches.h
>=
#ifndef _path_matches_h_defined_
#define _path_matches_h_defined_
#include "string-list.h"
class path_matches {
public:
// Add the given match to this match set if it isn't already there; do
// nothing otherwise.
void add(const std::string & s) {
if (not matches.has(s)) matches.append(s);
}
// Return the number of matches stored in this match set.
const std::string & get(unsigned i) const { return matches.get(i); }
// Return the number of matches stored in this match set.
unsigned size() const { return matches.size(); }
private:
string_list matches;
};
#endif
Definespath-matches.h
,path_matches
(links are to index).
Sequences and String Lists
The sequence
class is a reimplementation of string-list
as a template
class so it can be used to create sequences of any type of data values (as long
as the type supports assignment).
<sequence.h
>=
#ifndef _sequence_h_defined_
#define _sequence_h_defined_
#include <cassert>
template <typename T>
class sequence {
public:
void
append(const T & data) {
// Add the given element to the end of this sequence.
assert(element_cnt ≤ max_elements);
if (element_cnt == max_elements) {
max_elements *= 2;
T * sa = new T [max_elements];
for (unsigned i = 0; i < element_cnt; i++)
sa[i] = elements[i];
delete [] elements;
elements = sa;
}
elements[element_cnt++] = data;
}
const T &
get(unsigned i) const {
// Return the element at the given index in this sequence.
assert(i < element_cnt);
return elements[i];
}
bool
has(const T & e) const {
// Return true if this sequence has the given element.
for (unsigned i = 0; i < element_cnt; i++)
if (e == elements[i])
return true;
return false;
}
sequence &
operator = (const sequence & s) {
// Copy the value of the given sequence into this sequence.
if (this ≠ &s) {
delete [] elements;
max_elements = s.max_elements;
element_cnt = s.element_cnt;
elements = new T [max_elements];
for (unsigned i = 0; i < element_cnt; i++)
elements[i] = s.elements[i];
}
return *this;
}
sequence() :
// Return an empty sequence.
max_elements(16),
element_cnt(0),
elements(new T [max_elements]) {
}
sequence(const sequence & s) :
// Return a new sequence equal to the given sequence.
max_elements(s.max_elements),
element_cnt(s.element_cnt),
elements(new T [max_elements]) {
for (unsigned i = 0; i < element_cnt; i++)
elements[i] = s.elements[i];
}
~sequence() {
// Get rid of this sequence.
delete [] elements;
element_cnt = max_elements = 0;
elements = 0;
}
unsigned
size() const {
// Return the number of elements in this sequence.
return element_cnt;
}
private:
unsigned max_elements, element_cnt;
T * elements;
};
#endif
Definessequence
,sequence.h
(links are to index).
<sequence.cc
>=
#ifdef TESTINGSTRINGLIST
#include <cassert>
#include <string>
#include "sequence.h"
// g++ -o testing-stringlist -g -DTESTINGSTRINGLIST -ansi -pedantic -Wall -Weffc++ sequence.cc && ./testing-stringlist
int
main() {
sequence<std::string> sl;
assert(0 == sl.size());
const std::string s = "hello";
sl.append(s);
assert(1 == sl.size());
assert(sl.has(s));
assert(not sl.has(s + " world"));
assert(sl.get(0) == s);
}
#endif
A string list is now a sequence of strings.
<string-list.h
>=
#ifndef _string_list_h_defined_
#define _string_list_h_defined_
#include <string>
#include "sequence.h"
typedef sequence<std::string> string_list;
#endif
Definesstring-list.h
,string_list
(links are to index).
input-tree.cc
>: D1
input-tree.h
>: D1
main.cc
>: D1
match-paths.cc
>: D1
match-paths.h
>: D1
nary-tree.cc
>: D1
nary-tree.h
>: D1
path-matches.h
>: D1
path-pattern.h
>: D1
sequence.cc
>: D1
sequence.h
>: D1
string-list.h
>: D1