- This problem is related to the rope data structure.
- A rope is a heavy duty string, efficient for large numbers of
characters.
- An SGI extension to the STL, a possible addition to the standard.
- An exercise in asymptotic analysis and code development.
- Operations require sub-linear behavior.
- True performance comes from asymptotic behavior, the rest is just
detail.
- The big idea here was sub-linear time.
- Without recognizing the sub-linear requirement, you can't solve the
problem.
- Most people failed to solve the problem.
- What is sub-linear behavior?
- Anything that grows less quickly than linear growth.
- Copying is necessarily a linear operation.
- A successful copy has to touch all n elements.
- If copying is linear, and you need to be sub-linear, what do you do?
- You don't copy; it's too expensive.
- If you don't copy, what do you do?
- You have to share.
- Sharing data involves pointer manipulation, which is a constant-time
operation.
- Slicing
- Catenation
- Assignment
- Shared data can't be changed; it unexpectedly changes other arrays.
- Element writing.
- Shared data has to be managed carefully.
- No garbage and no dangling pointers.
- In C++, this can be done with reference counting.
- When the count drops to zero, free the data.
- This is a pain to get right.
- Each array sequence has zero or more data sequences.
- Element indexing requires skipping over data sequences.
- "Skipping over" is a linear operation. Oops.
- Vectors and lists are not sub-linear data structures.
- You may be forced to touch all n elements.
- Trees are sub-linear data structures.
- Top to bottom traversal takes O(log n).
- A bounded but arbitrary number of traversals per operation is still
O(log n).
- In the average case, which is good enough for this assignment.
- Use trees rather than vectors or lists to keep track of the data.
- Catenation
- Slicing
struct seq {
// The element type.
typedef int etype;
// The number of pointers.
unsigned references;
// The data stored.
const std::vector<etype> data;
}
// An empty sequence
seq() : references(0) { }
// A single-element sequence.
seq(etype e) : references(0), data(e) { }
// A range sequence.
seq(
const etype * const b,
const etype * const e)
: references(0), data(b, e) {
}
- These were allowed to be linear.
// The element count.
unsigned size(void) const
return data.size()
// Point lhs to the same thing as rhs.
friend void
seq_assign(
seq * & lhs, const seq * const rhs)
if (lhs == rhs)
return
if (lhs != 0) {
assert(lhs->references > 0)
if (--(lhs->references) == 0)
delete lhs
}
lhs = const_cast<seq *>(rhs)
if (lhs != 0)
++(lhs->references)
- Note the reference counting and
const
trickery.
// Can't assign one sequence to another.
seq & operator = (const seq &)
return *this;
// Can't copy one sequence from another.
seq(const seq &) { }
// Can't delete a sequence.
~seq() {
assert(references == 0);
}
- Making these member functions private makes them unavailable for use,
even by the compiler.
- These can still be used inside the
seq
class.
- All
seq
s must come off the heap.
struct data_seq
const seq * const data;
unsigned b, e;
data_seq(const data_seq & ds)
: b(ds.b), e(ds.e)
seq_assign(const_cast(data), ds.data)
~data_seq
seq_assign(const_cast(data), 0)
unsigned size(void) const
return e - b;
- The same sequence can support different data sequences.
- The values of
b
and e
will differ.
This page last modified on 27 November 2002.