Homework 6

Computer Networking

Homework 6, 4 December 2012

This homework assignment has five questions; answer all of them. This assignment is due no later than 7:00 p.m. on Thursday, 13 December.

If you mail in your assignment, please submit a printable document — a PostScript .ps or PDF .pdf document, for example — and not a source document — a Word .docx or Latex .tex document, for example.

Suppose a peer-to-peer network is based on a small-world topology. Describe an attack that does a large amount of damage to the network's capability for a small amount of work on the attack. Your description should include 1) the attack, 2) the expected damage to the network, and 3) how the damage effects the network's capabilities.
The small-world topology has a many close-by connections and fewer long-distance connections. A node in a small-world network is either directly connected to, or a few local hops from, a connection to a far-way node.
Attacking the long-distance connections would take less effort than attacking other connections because there are fewer long-distance connections (1). Eliminating long-distance collection would partition the network into a set of disconnected, small-capacity sub-networks (2). Such a disconnected network would have greatly reduced and fragmented storage capacity (3).

The answers, approximately verbatim, and in no particular order:
1. A possible attack on a peer-to-peer network would be to bring down the server.¹ The attack would need to cripple the server since it performs all of the routing and connections among the peers. Napster, for example,² would have been crushed by a server attack because the peers (users) wouldn't be able to contact each other and send/receive files.
  
  1: Do small-world peer-to-peer networks have servers?
  
  2: Is Napster based on a small-world topology?
2. In a small-world topology, there is a path between any two nodes in the network.³ Hence, the network is almost fully connected.⁴ To attack this topology, it is required to attack at least half of the nodes to do big damage to this network. Flooding the whole network is a possible attack to stop this network. However, attacking this type of network costs a lot and needs huge work.
  
  3: A network with that property is connected, but a connected network doesn't necessarily have a small-world topology.
  
  4: If there's a path between any two nodes, there's no “almost:” the network is fully connected.
3. i) Attack is a denial of service attack. \ ii) The expected damage to the network is a DOS attack, peer is overloaded by useless messages and its resources are wasted to perform meaningless tasks so that the node cannot serve its intended purpose.⁵ \ iii) For example, a malicious node can continuously send (or route) messages to a particular node. In this way, the targeted node's bandwidth is consumed just to transfer messages,⁶ rendering its shared resources (CPU and storage) unavailable to other nodes in the P2P network even though they may be in under-utilized. \ Reference: Peer-to-peer computing textbook.
  
  5: Ok, but how does this attack make use of the small-world topology.
  
  6: True, but the damage done by this attack seems linear in the number of nodes attacked, which doesn't provide large damage for small effort.
Two peer-to-peer networks N₁ and N₂ are identical except that N₁’s network topology has power-law characteristics and N₂’s network topology does not. Describe a query or series of queries that would demonstrate the differences between the two networks. Your description should include 1) the queries, 2) the expected results from the queries, and 3) how the results obtained distinguish the two networks.
The trick to this question is to define what network characteristic exhibits a power-law relation. However, in general, a network exhibiting the power law will have a small percentage (≤ 10%) of x (usually nodes) showing a large percentage (≥ 80%) of some other property y, such as neighbors or stored items.
The query depends on x and y, say x = nodes and y = neighbors. A series of queries to the nodes of each network about their neighbors (1) would identify a small set of nodes with many neighbors in one network that were missing in the other network (2). The identified power-law network would have a few nodes that are good starting places for queries with short expected search paths while the other network would no obvious starting points and correspondingly longer search paths (3).

The answers, approximately verbatim, and in no particular order:
1. 1 Queries
  a The rank and out-degree power laws are the most effective in distinguishing different kinds of topologies.
  b In a network topology has power-law characteristics then it should satisfy P(K) proportional to k power -y where P(K) is the probability that a node has k edges.¹
  2 Results Expected. Results show which network follows power law.²
  3 By the expected results we can distinguish two networks.³
  
  1: What are the queries?
  
  2: How are these results characterized? Will any results do?
  
  3: Distinguish how?
2. Power-law characteristics typically hold important parameters about the network.⁴ Some queries to display the differences could include average neighborhood size, how many peers exist, the frequency with which certain peers send/receive data, etc. The results from these queries would be the quantitative data that describes the network.⁵ These results could be used to help optimize the network and the distribution of data across the peers. N₂ would not benefit from this data (having no power-laws to obtain it), while N₁ would.
  Source: dx.doi.org/10.1145/316194.316229
  
  4: What parameters? How are they held?
  
  5: How would the data be characterized in general?
3. A power-law topology has the property that a small proportion of nodes have a high out-degree (i.e., have many connections to other nodes), while the vast majority of nodes have a low out-degree, (e.e., have connections to few nodes). Formally the frequency fd of nodes with out-degree d exhibits a power-low relationship of the form fd μ da, a < 0. Internet routing topologies⁶ follow this power-law relationship with a = -2. In other words, power-law implies that small occurrences are extremely common, whereas large instance are extremely rare. Exploiting power-law can increase the efficiency of search processes in networks. Since some nodes have more connections to other nodes, these nodes can be accessed faster and can have more valuable and frequent results that other nodes. Results for future searches can also be predicted according to power-law.
  (source: selfUFL(www.hpl.hp.com))
  
  6: Are Internet routing topologies related to peer-to-peer network topologies, which is what the question asked about?
16 copies of a file are stored in a connected, unstructured peer-to-peer network of 1024 nodes with each node having at most one copy of the file. A random node in the network issues a breath-first query for the file. What is the expected value of the distance in hops between the file and the query node? Justify your answer.
Assuming a random distribution of files to nodes, the probability that a node has a copy of the file is p = 16/1024 = 1/64 = 0.016, and that it doesn't have a file copy is q = 1 - 0.016 = 0.984. Assuming independence among nodes, which is a poor assumption given peer-to-peer routing algorithms, the probability that the file will be found after i hops is q^{i - 1}p. The number of paths of length i from a particular node is related to the average fan-out f: fⁱ. The expected value for path length is then the sum of fⁱq^{i -}p for i = 0 to 1024 - 16, assuming a connected network and depth-first search.
The answers, approximately verbatim, and in no particular order:
1. 16 copies across 1024 nodes with a most one copy per node yields a 1-in-64 chance of a given node containing the desired file (1024/16).¹ Given a connected, unstructured peer-to-peer network, the expected value of the distance in hops between the file and the query node would be 64.
  
  1: Chances can't be greater than 1, so it's 16/1024.
2. The basic idea of directed Breadth-First Search approach is that the query node sends the query message to a subset of its neighbors that will quickly return many high-quality results. These neighbors then forward the query message to all their neighbors.²
  In structured P2P systems,³ connections among peers in the network are fixed, and peers maintain information about the resources (e.g., shared content) that their neighbor peers possess. Hence, the data queries can be efficiently directed to the neighbor peers that have the desired data, even if the data is extremely rare.⁴
  Unlike structured P2P systems, in unstructured P2P systems, connections among peers in the network are formed arbitrarily in a flat or hierarchical manner. In order to find as may peers that have the desired content as possible, peers in unstructured P2P systems query data based on several techniques such as flooding, random walking, and expanding-ring.⁵
  
  2: If the neighbors return many high-quality results, why would they then forward the query to all their neighbors? Isn't the search done?
  
  3: The problem's concerned with unstructured peer-to-peer networks.
  
  4: Can “efficiently” be quantified in terms of the values given in the question?
  
  5: And what should be expected of these techniques given the values in the question?
3. In case of a Breadth-first query, when a node receives a query message, besides processing the query, it simply forwards the query to all its neighbors except the sender. So, in worst case, where one node has only one neighbor then flooding continues until it reaches the node which has the copy of the file, so the distance between the file and the query node will be 1024 - 16 - 1 = 1007 hops, at 1008^th node that file will present.⁶ Because, assume like file copies are preset at one end of the network and the query node is present at other end of the notwork then (1024 - 16) nodes doesn't have any copy of the file and one is the query node. \ Reference: Peer-to-peer computing textbook.
  
  6: That's the worst case; what's the expected (average) case? And what about networks that aren't linear?
The secure hash algorithm SHA-1 has been broken and is no longer considered reliable for use. Describe a potential consequence of continuing to use SHA-1 in a peer-to-peer network. You may answer in general terms (which is harder), or with reference to a specific peer-to-peer network (which is easier). Your answer should involve non-security uses of SHA-1 in the network. In addition to answering the question, your description should show that you understand 1) in what way SHA-1 is broken and 2) how a peer-to-peer network uses SHA-1 in non-security mechanisms.
Generally, hash functions are used in peer-to-peer networks to randomize locations, providing wide accessibility and load-balancing. The SHA-1 crack allows for generating collisions at will (1), which could be used to defeat both accessibility and load-balancing as well as node hijacking, depending on how the hash is used (2). Some networks, such as Freenet, that place a premium on privacy and anonymity may find it difficult to maintain those properties using a compromised hash function.
The answers, approximately verbatim, and in no particular order:
1. The SHA-1 algorithm is the base hashing function consistent hashing.¹ Consistent hashing is integral to the robustness ad performance of Chord because both keys ad IDs (IP address) are uniformly distributed ad in the same identifier space. Consistent hashing is also necessary to the nodes join and leave the network without disruption.² \ Reference: Wikipedia
  
  1: What?
  
  2: How does a cracked SHA-1 effect this?
2. SHA-1 has been broken to the point where the potential for collisions has been reduced dramatically. SHA-1 has used to authenticate transmissions (for example, song files in Napster). SHA-1 could be used to verify that a particular peer received the appropriate file — the right person's computer received the desired song file. Continuing to use the broken SHA-1 algorithm could result in incorrect files being sent or received. Viruses could pass for song files and the Napster example, etc.
  Source: news.techworld.com/security/3156/crypto-world-in-panic-as-sha-1-broken/
3. SHA-1 is being used in file sharing over peer-to-peer networks as a checksum mechanism to check whether the shared file is correct or corrupted.³
  It is also used in Chord server that has a unique 160-bit Chord node identifier which is produced by SHA-1 hash of the node's IP address.⁴
  Attacking SHA-1 was done by a Chinese research team, and the team had an attack finding collisions in 2⁶⁹ trials.
  The team has reduced the amount of time needed to find two documents with the same signature by a factor of more than 2000. This attack is described as a collision attack.
  (source: www.schnieier.com, www.dcs.ed.ac.uk)
  
  3: What is consequence to these uses that SHA-1 has been cracked?
  
  4: What is the consequence to Chord that SHA-1 has been cracked?
Do you think it would make sense to Chord-style finger tables to a classic (unstructured) Gnutella networks? Justify your answer. Your justification should show that you understand what finger tables do in Chord and how classic Gnutella works.
Finger tables would not be too useful in classic Gnutella because it doesn't have a meaningful addressing relation between items and nodes. Chord can relate items and nodes, so that an item identified as far away from a particular node n will probably be located in the neighborhood of a node that's also far away from n. In that case, a finger table, which allows for progressively larger jumps in the network, can be exploited to move the search rapidly to a neighborhood that should contain the search item.
Classic Gnutella has no such relation between items and nodes. There's no good reason to suppose that an item can be found in a particular neighborhood, and so there's no good reason to be able to jump directly to that neighborhood.

The answers, approximately verbatim, and in no particular order:
1. A finger table is a routing table with t entries. The finger table is maintained by each node in the network.¹ The ith entry in the table at node n contains the identity of the first node s that succeeds n by at least 2^{(i - 1)} on the identifier circle. For example, s = successor(n + 2^{(i - 1)}), where 1 ≤ i ≤ t (and all arithmetic is modulo 2^t). The node s is called the ith finger of node n, and denoted by n.finger[i].node. The first finger is n is its immediate successor on the circle. In addition, the ith finger entry of node n contains the interval, [n.finger[i].node, n.finger[i + 1].node, called the ith finger interval of node n, and denoted by n.finger[i].interval.
  (source: Chord Implementation, www.dcs.ed.ac.uk)
  
  1: A Chord network, or a Gnutella network?
2. No, because Chord-style finger tables works with structured peer-to-peer network with a ring topology whereas classic Gnutella is an unstructured network with random topology where each peer has a look-up table which has the list of neighbors of itself (peer).² But if we can arrange peers in Gnutella, i.e., making a structured Gnutella then we can use Chord-style finger tables.³
  
  2: But what is it about ring structure that makes finger tables work?
  
  3: How would adding structure to a Gnutella network make finger tables useful? Is finger-table value based on structure, or some other property?
3. Classic Gnutella peer-to-peer networks function primarily as file transfer systems. Nodes are added by opening multiple connections with nodes already in the network. Nodes frequently are added and removed, so a dynamic and sacalable system would work best.⁴ Chord-style finger tables would work well for Gnutella networks assuming the sub-linear grouping approach is taken.⁵ Search protocols should not check each node linearly; rather, neighborhoods should be analyzed to avoid unnecessary checking.
  Source: cs-www.cs.yale.edu/homes/arvind/cs425/doc/gutella-rc.pdf
  
  4: What kind of system is it that should be dynamic and scalable? The finger table?
  
  5: Sub-linear means what?

This page last modified on 2012 December 16.