CS 512 Homework 1

m4_divert(2) m4_define(_commentnesting, 0) m4_define(_begincomment,m4_dnl m4_ifelse(_commentnesting, 0, )) m4_define(_warning,m4_dnl _begincomment()This document was created by gen-_name.html Do not edit this document, edit gen-_name.html._endcomment()) m4_define(_hdrs, ) m4_define(_anchor_no, 0) m4_define(_abs, m4_ifelse(m4_eval($1 < 0), 1, m4_eval(-($1)), $1)) m4_define(hdr, _warning() $2 _warning() m4_ifelse(m4_eval($1 >= 0), 1, m4_define(`_hdrs', _hdrs`'_anchor_no$1m4_translit(`$2', `,', ))) m4_define(_anchor_no, m4_eval(_anchor_no + 1)) ) m4_define(usetag, _warning()$2_warning()) m4_define(puttag, _warning()$2_warning()) m4_changecom() m4_define(sps, $1^$2) m4_define(sbs, $1_$2) m4_define(,_warning()$1_warning()) m4_define(,_warning()$1_warning()) m4_define(itl,_warning()$1_warning()) m4_define(bld,_warning()$1_warning()) m4_define(hrf,_warning()$2_warning()) m4_define(mailat,_warning()hrf(mailto:$1,$1)_warning()) m4_define(nwsgrp, hrf(news:$1, $1)) m4_define(bookref, $1, $2. $3: $4, $5.) m4_define(i2m,m4_ifelse( $1,01,January, $1,02,February, $1,03,March, $1,04,April, $1,05,May, $1,06,June, $1,07,July, $1,08,August, $1,09,September, $1,10,October, $1,11,November,December)) m4_dnl $Revision: 1.5 $, $Date: 1999/12/09 19:16:32 $ CS 512 Homework 1 hdr(2, Homework 1 - Graph and Geometric Algorithms) hdr(3, CS 512, Algorithm Design)

Design an algorithm that accepts an undirected graph G and returns the number of maximally connected components of G. A graph is connected if every pair of nodes in the graph is connected by a path. A (component) of a graph G is a maximally connected subgraph of G. A maximally connected subgraph M of a graph G is the largest possible connected subgraph of G; that is, there is no other connected subgraph of G that contains M as a subgraph. How would your algorithm change if it had to accept directed graphs?
The key to solving the first part of this problem is to recognize that a depth- or breadth-first traversal through an undirected graph results in a maximal, connected subgraph. That the traversal traces out a subgraph is clear; the subgraph is connected because the traversal only advances through the graph along edges; and the connected subgraph is maximal because if there was a connected subgraph larger than the one traced out by the traversal, there would have to be an edge from a marked node the smaller subgraph to an unmarked node in the larger subgraph, but the traversal would have discovered that edge to an unmarked node during its execution and would have added it to the subgraph it was creating.
The only part remaining is to recognize the traversal has to be performed on each unmarked node in the graph because the graph may be unconnected. Each traversal of the graph would be counted as one component of the graph.
This answer also appears in the text as Figure 7.5 on page 192, algorithm Connected_Components.
The point to the second part of this problem is figuring out what the components of a directed graph might be. One answer could be that components are not defined for directed graphs, and so there is no such algorithm. This is not a particularly good answer because it's hard to justify: it makes sense to talk about subgraphs of a directed graph, it makes sense to talk about connected subgraphs of a directed graph, and it makes sense to talk about maximally connceted subgraphs of a directed graph, co components seem to be well defined for directed graphs.
Another answer, just a little better than the previous one, is to define a new kind of connectivity for directed graphs. For example, given a directed graph G, a node n in G is v-connected to node v if there exists a path from v to n in G. A v-component of G would then be the maximal, v-connected subgraph rooted at v. This is not a bad answer because it reduces the problem of connectivity in a directed graph to the all-shortest paths problem. However, it's also not a very good answer either because it defines a set of new, non-standard terms, which is something that should be avoided unless absolutely necessary.
The best answer is to keep the usual definitions of component and the other terms and apply them to directed graphs. A depth- or breadth-first traversal of a directed graph still generates subgraphs. The connected part is a bit trickier: when is possible to always find a path between any two nodes in a set of nodes taken from a directed graph? A little thought shows that it is only possible when all nodes in the set are part of the same cycle (or, more generally, circuit). Depth-first traversals, by recognizing back edges, can find cycles and, with a bit more bookkeeping, find circuits too. A connected subgraph of a directed subgraph is maximal if every node in the graph is part of a circuit; otherwise, such a node would be unreachable from a circuit - making the subgraph unconnected - or would not be part of the circuit - making the subgraph be not maximal.
The Connected_components algorithm assumes a undirected graph, so it's not appropriate for directed graphs. Algorithm Find_a_Cycle in Figure 7.11 detects if a directed graph contains a cycle, but doesn't trace out the cycle. This, however is an easy enough modification to make, essentially by following the back-edge around again, tracing the nodes. Find_a_Cycle also stops at the first cycle it finds; fixing this too is easy.
How would you model a maze as a graph? Develop an algorithm that generates mazes using your model.
Assume a one-entrance, one-exit maze laid over the positive integer coordinates of the x-y plane. The size of the maze is n > 1 units along the x axis and m > 1 units along the y axis. The graph is rectilinear, which means all walls run north-south or east-west; there's no diagonal walls.
The undirected graph modeling the maze has n*m nodes. Each node has a name of the form xVy where 1 <= x <= n and 1 <= y <= m. The node xVy in the graph represents the position (x, y) in the maze. An edge between two nodes in the graph indicates there's an unobstructed path between the corresponding points in the maze. To preserve the maze's rectilinear structure, edges in the graph are constrained to obey the following property: an edge may connect the node xVy only to the four nodes x+1Vy, x-1Vy, xVy+1, and xVy-1. Two distinct the nodes in the graph are distinguished; both distinguished nodes have the form 1Vy, nVy, xV1 or xVm; that is, they are nodes on the sides of the graph. The two distinguished nodes are called the start and end nodes (it doesn't matter which is which).
A maze has a single path from the entrance to the exit; this is modeled in the graph by having a single path from the start to the end node. The representation is a bit more complicated than this however, because, for example, the maze path may contain a loop, which means there may be more than one path from entrance to exit (that is, the direct path omitting the loop, the path that includes one trip around the loop, the path that includes two trips around the loop, and so on). In addition, the maze path may fork into two paths and later merge the branches into a single path; in this case one path may take the left fork and another path may take the right fork. And, of course, the maze path may contain a loop and a fork, and perhaps more than one of each. Rather than complicate things needlessly, however, we'll assume that the exactly one simple path connects the start and end nodes.
Under these set of assumptions and conditions, an algorithm to generate a graph naturally falls into three parts: initialize the graph, construct a simple path from the start node to the end node, and then complicate the simple path to make the maze interesting:
```
graph make_maze(n, m: integer)

  initialize the graph
  generate the simple path from start to end
  complicate the simple path

  return the graph
```
"Initialize the graph" means create the vertices and select the start and end nodes. At this point note that the graph has no edges, just vertices.
"Generate the simple path" adds edges to the graph to form the simple path. There are a number of ways to do this; perhaps the simplest is the following (assume the start node is xVy and the end node is x'Vy':
```
if x' >= x then
  for i <- x to x' - 1 do
    add {iVy', i+1Vy'} to the edge set
else
  for i <- x to x' + 1 by -1 do
    add {iVy', i-1Vy'} to the edge set

if y' >= y then
  for i <- y to y' - 1 do
    add {x'Vi, x'Vi+1} to the edge set
else
  for i <- y to y' + 1 by -1 do
    add {x'Vi, x'Vi-1} to the edge set
```
This results in a particularly simple, L-shaped path from start to end. Creating a more complicated simple paths is possible, but because they wouldn't add anything new to the discussion, other possibilities won't be considered further.
"Complicate the simple path" adds filigrees to the simple path to make it more complicated. Perhaps the easiest way to do this is to pick nodes on the simple path and generate a dead-end from that path by randomly generating a path among the unused nodes (that is, the nodes that are not on the simple path and haven't been used in another dead-end. As with generating the simple path, there are more involved ways of generating complexities, but they won't be considered further here.
Given a set S of line segments, develop an algorithm to determine if S contains two parallel lines.
A property of parallel lines important here is that they create identical angles to a common third line (the x axis, for example). A simple algorithm to find parallel lines would intersect the lines in S with the x axis and find two lines that make the same angle with the x axis:
```
bool parallel_lines(S: set of lines)

  i <- 0
  for each l in S do
    angle[i] <- the counter-clockwise angle from the x axis to l;
    i <- i + 1

  sort angle

  for j <- 1 to i - 1 do
    if angle[j - 1] = angle[j] then return true

  return false
```
Alternatively, you could have used the property that parallel lines have the same slope (or you could have used the optimization described in Section 8.3 of the text, page 272) to simplify the calculations.
Write an algorithm that accepts the coordinates of a simple polygon on the integer plain (that is, the x-y plain with only integer coordinates), and then visits every point inside the polygon (this algorithm is the basis of the flood or fill operation in paint programs).
The two principles to bring to bear on this problem are scanning (Section 8.6 of the text) and inside-outside (Section 8.2 of the text). A scan line - parallel to the x axis, say - scans up (or down) coordinate by coordinate along the y axis. At each y coordinate, the scan line intersects the polygon at one or more places. Moving from left to right (or vice versa) along the scan line - that is, coordinate by coordinate along the x axis - produces a sequence of points along the scan line, each of which is either inside or outside the polygon:
```
scan_inside_polygon(vertices: array of vertex)

  xmin, xmax, ymin, ymax <- the bounding box enclosing the polygon

  for y <- ymin to ymax
    intersections <- the intersections between the scan line and the polygon's
		     sides
    intersections <- intersections sorted by increasing x value.
    for i <- 0 to |intersections| - 1 by 2 do
      for x <- ceiling(intersections[i]) to floor(intersections[i+1]) do
        touch point (x, y)
```
The inner for loop may need some explaining. The left-to-right scan along the scan line starts outside the polygon; ignoring special cases for the moment, the first intersection encountered along the scan line leads to the polygon's inside, the second leads back to the outside, the third leads back to the inside, and so on. With this view, intersections becomes a sequence of pairs (p1, p2)(p3, p4)...(pn-1 pn) with each pair (pi, pj) representing an entry into the polygon at pi and back out again at pj.
Even though the polygon's vertices and the scan line have integer coordinates, the intersections between the polygon's sides and the scan line may have non-integer coordinates. The floor() and ceiling() functions make sure the interior points have valid integer coordinates.
Finally, the special cases. There are two to worry about: the scan line intersects a polygon vertex, and the scan line overlaps a polygon side. Before considering these special cases, however, we need to determine if a point on the polygon's border is should be considered as part of the interior. Either choice is acceptable, but allowing points on the border to be part of the interior leads to a simpler algorithm, so it's the decision we'll make.
There are two further cases to consider when the scan line intersects a polygon vertex: when the two edges associated with the vertex straddle the scan line and when they don't. There is nothing special about the intersection when the edges straddle the scan line, and it can be added to the list of intersections without further processing. A little trick is required when the both sides lie on one side of the intersection: add the intersection twice to the list. This trick corresponds to thinking about the vertex intersection as entering (or leaving) the polygon at one side of the vertex and leaving (or entering) the polygon at the other side. A problem with this trick is that it may occasionally process a polygon vertex twice (when would that happen?), which may be objectionable. The fix is to add some extra bookkeeping to the algorithm (what kind of bookkeeping?), but because this extra bookkeeping doesn't add anything fundamentally new to the algorithm, it will not be considered further here.
There are also two further cases to consider when the scan line overlaps a polygon side: when the side continues an interior scan, and when the side ends an exterior scan and begins an interior scan. In the first case, just ignore the side; the points on the side representing interior points will be taken care of by the continuing interior scan. In the second case, add the side's endpoints to the list of intersections. This solution too has a problem: it assumes the inside-outside intersections are known while the intersections are being found, which, given the algorithm above, is not the case. The fix here too involves some extra bookkeeping, and here too the extra bookkeeping is not fundamentally interesting and will not be further considered.

This page last modified on 21 i2m(09) 1999. m4_dnl $Revision: 1.2 $, $Date: 1999/03/12 17:36:48 $ m4_divert(1) m4_changequote({, }) {m4_define(tblfcntnts, _toc_tableit(}_hdrs{) _toc_down(_toc_ilevel, _toc_level) ) m4_define(_toc_tableit, m4_regexp($1, $[0-9]+$$[0-9]+$$[^]+$$.*$, m4_ifdef(_toc_level, , m4_define(_toc_level, m4_eval(\2 - 1)) m4_define(_toc_ilevel, _toc_level)) _toc_entry(\1, \2, \3) _toc_tableit(\4)) ) m4_define(_toc_down, m4_ifelse(m4_eval($1 < $2), 1, _toc_down($1, m4_eval($2 - 1))) ) m4_define(_toc_entry, m4_ifelse(m4_eval($2 > _toc_level),1,

m4_translit($2, , ,) ) }