The Final Flight of the Harvster

Ben-Hur "Harv" C. Viray. MS Computer Science candidate. Network and Distributed Systems Group. University of the Philippines, Diliman

Thursday, February 17, 2005

Thesis Summary

Problem: Loosely structured peer-to-peer systems (LSP2P) not scalable enough

Solution:
  • Distributed flow control
  • Topology construction algorithm
  • Best neighbor replication strategy
How?
  • Link of peer with highest outgoing rate (most number of requests) get redirected to the peer with the highest spare capacity (both peers are neighbors of the requested peer)
  • Requested data is replicated at the peer with the highest spare capacity (the best neighbor)
Approach:
  • Compare a LSP2P system (Anthill's Gnutant? Freenet?) with a modified one (Harvster?)
  • To determine search success rate: # of searches vs. # of successful searches
  • Average search depth: #hops vs. #searches (Wonder if I can track more than a hop)
  • Documents (bytes) copied
Tools:
  • Anthill - discontinued in favor of a more scalable and lightweight simulator called Peersim, developed under the BISON project. Heavily lacks documentation and tutorials!
  • Other simulation tools?
Strategy:
  • Populate peers using a simulation model
  • Analyze using random loads and get statistics (graph 'em perhaps)
  • Change protocols
  • Parameters
    • Outgoing load
    • Capacity
    • Neighbor + load (sum of outgoing load)
    • Neighbor load (total)
    • Neighbor capacity
    • Load priority (depends on priority strategy chosen)
    • Bandwidth
    • Distance
Concerns:
  • How do we determine load metrics?
    • # of files? - maybe not
    • # of bytes?
  • Bandwidth and distance (not sure if this is needed) not important because we are after scalability?
  • To simulate a real environment, we will use the bandwidth distribution used in Gnutella (research more on this)
  • How do you define capacities? Random?
  • How do you define priorities for preemptions?
    • FCFS
      • as a whole (total # of bytes) [maybe not] or individual (bytes)
      • a request can be any # of bytes so to avoid confusion, we'll use bytes
      • preempt last ones until capacity is met
    • SJF - with starvation?
      • same as FCFS but as a whole?
      • Time to complete job = # of bytes requested / bandwidth (rank na lang?) [t = d / r]
      • preempt longest jobs
    • RR
      • do not accept if new job will go over the quota
      • proceed to next job
  • How do you determine distance?
    • Hops?
      • hmm, I think it should be between
      • peers, hops = 1 always
  • When do requests arrive
    • Same as above? (t = d/r?)
  • Cycle time = search time + process time + reply time + download/copy time
    • Search time = reply time
    • Process time
      • 1 if file seen?
      • ST + (summation of PT) + RT + DT until file seen or timeout
    • Download/copy time
      • reply time per byte * number of bytes
      • Download only happens when file is found (if not, it is passed on its neighbor, based on "hint"
      • Copy only happens when there is available space on the target
  • Random addition and subtraction of peers?
  • Random addition and subtraction of files?
  • If there is no peer with highest spare capacity, then distribute load to neighbors
  • Other parameters
    • # of searches
    • # of successful searches
    • # of hops before file is seen
    • list of tiles and size
    • time before file is seen
    • time before file is downloaded
    • queue for requests
    • timer for requests
    • replicate time
    • topology
    • reconstruction time
Other simulators to investigate:

Saturday, February 12, 2005

Title and Abstract

Title: Improving Scalability in Loosely Structured Peer-to-Peer (P2P) Systems
Adviser:
Dr. Cedric Angelo M. Festin

Abstract: The successes of early systems such as
Napster and Gnutella have shown that peer-to-peer (P2P) can be as powerful as the client-server model in terms of user roles. Napster and Gnutella enable users to share files in an end-to-end manner. Aside from sharing files, P2P is used for instant messaging and distributed computing. Although P2P has been increasingly used, it has problems with scalability. Unstructured, structured, and loosely structured frameworks have been applied to P2P systems with varying success to improve scalability for file sharing in the Internet. We propose to use an approach in unstructured systems and apply it to loosely structured systems. This involves control and topology adaptation used in unstructured systems in controlling the arrival and departure of messages due to querying and redirecting the data from one node to another. The redirection will be enhanced with replication. Anthill, a Java- based testbed built for the study, design and analysis of P2P systems, will be used in this research. The behavior of the proposed system will be analyzed through Anthill's simulation tools.

My abstract can also be seen
here. This piece of information was lucky enough to be awarded fourth place at the 1st Computer Science Graduate Research Abstract Competition. More information about the contest can be seen at the Asian Java Competency Program 2003 Accomplishment Report.