Extracting statements.


R. Clayton (rclayton@monmouth.edu)
(no date)


  Here's my extraction code

   std::istream & operator>>(std::istream &in,Instruction &i){
           std::string str;
           std::getline(in,str);
           std::istringstream istrm(str);
istrm>>i.operation>>i.operand1>>i.operand2;
           return in;
   }

   I couldn't understand if this would chew blank lines.

It does, in a roundabout way. getline() preserves space characters except for
the line-ending newline character, which is stripped. Then you take str and
use it in an istring-stream, from which you read the instruction parts. If str
is empty (contains no characters) or blank (contains only space characters) the
extraction for i.operation will fail, and the contents of i remain unchanged,
which explains why you got the previous instruction again.

The fix is as I described in the previous message: break your solution into
smaller pieces to separate out into distinct parts reading blocks from input
and turning blocks into programs. To illustrate the wisdom of this approach,
let's try to massage your all-in-one solution to work correctly.

getline() could either fail or read a line. If it reads a line, the line could
be blank ("or empty" assumed from now on) or not. If getline() fails, in
already has its error bits set properly, so we could just return:

  string str
  if not getline(in, str)
    return in

If str is blank, we have to do something, but in figuring out what we run into
another problem: blank lines are interpreted differently depending on where
they occur. Any blank line before the first program block should be ignored;
the first blank line after the program block should not be ignored, but any
following blank lines after that should be ignored.

Unfortunately, we don't have enough context within the instruction extractor to
know which of these cases (before the first block, first after the first block,
not first after the first block) hold. To solve this problem we either have to
pass the context into the instruction extractor or have the instruction
extractor pass its status back so the caller (which presumably knows the
context) can make the decision.

Passing extra context into the extractor is ugly all around. It either
requires a global variable, or an extra parameter, in which case the extractor
operator (which must be binary) has to turn into a non-extractor function call.

Passing status back to the caller is only slightly less ugly. If we don't care
about using extraction, we can add a reference to a status parameter. If we
want to keep the extractor, then we have to use either the istream or the
instruction to pass back the status. Probably the easiest way to do that is to
use the instruction: if the op-code is blank, then the last line read was
blank.

  if is_blank_line(str)
    i.operation = ""
    return in

We could use the istream's status bits to pass status back, but that would be
more complicated. We have to maintain EOF, which leaves the bad and fail bits.
Arbitrarily adapting fail to mean blank line would leave bad to indicate all
other errors. In addition, doing the bit manipulation on the istream is a
pain (don't forget to change the "just return" part above).

It seems we can solve this problem by passing back a special instruction value
to indicate a blank line. "What's wrong with that?" you may be saying to
yourself "It doesn't seem too bad." Before going on to describe what's wrong
with it, it might be useful to tally the damage done already.

First, extraction doesn't behave like it used to, particularly with respect to
handling space characters. Having non-standard stream extraction behavior can
confuse your readers (and possibly to other code). Second, understanding
instructions is now more complicated, because we have to deal with the
possibility of malformed instructions (with no op-code) that aren't really
malformed under some circumstances. Third, don't forget about the read loop.
By handing off the decision about blank lines to the caller, the read loop
becomes more complicated.

But the real problem with this solution (with all possible solutions discussed)
is that we're not done yet. The statement

  istrm >> i.operation >> i.operand1 >> i.operand2;

is incorrect because it accepts (for example) the malformed input "move "
(notice that operand1 and operand2 are unchanged, which makes this a
particularly noxious error to find). Now we have even more status to shuffle
around, which we can do using, for example, exceptions, but things are getting
complicated, and we're probably not yet done patching things up either.



This archive was generated by hypermail 2.0b3 on Mon May 03 2004 - 17:00:06 EDT