The Final Flight of the Harvster

Ben-Hur "Harv" C. Viray. MS Computer Science candidate. Network and Distributed Systems Group. University of the Philippines, Diliman

Thursday, February 17, 2005

Thesis Summary

Problem: Loosely structured peer-to-peer systems (LSP2P) not scalable enough

Solution:
  • Distributed flow control
  • Topology construction algorithm
  • Best neighbor replication strategy
How?
  • Link of peer with highest outgoing rate (most number of requests) get redirected to the peer with the highest spare capacity (both peers are neighbors of the requested peer)
  • Requested data is replicated at the peer with the highest spare capacity (the best neighbor)
Approach:
  • Compare a LSP2P system (Anthill's Gnutant? Freenet?) with a modified one (Harvster?)
  • To determine search success rate: # of searches vs. # of successful searches
  • Average search depth: #hops vs. #searches (Wonder if I can track more than a hop)
  • Documents (bytes) copied
Tools:
  • Anthill - discontinued in favor of a more scalable and lightweight simulator called Peersim, developed under the BISON project. Heavily lacks documentation and tutorials!
  • Other simulation tools?
Strategy:
  • Populate peers using a simulation model
  • Analyze using random loads and get statistics (graph 'em perhaps)
  • Change protocols
  • Parameters
    • Outgoing load
    • Capacity
    • Neighbor + load (sum of outgoing load)
    • Neighbor load (total)
    • Neighbor capacity
    • Load priority (depends on priority strategy chosen)
    • Bandwidth
    • Distance
Concerns:
  • How do we determine load metrics?
    • # of files? - maybe not
    • # of bytes?
  • Bandwidth and distance (not sure if this is needed) not important because we are after scalability?
  • To simulate a real environment, we will use the bandwidth distribution used in Gnutella (research more on this)
  • How do you define capacities? Random?
  • How do you define priorities for preemptions?
    • FCFS
      • as a whole (total # of bytes) [maybe not] or individual (bytes)
      • a request can be any # of bytes so to avoid confusion, we'll use bytes
      • preempt last ones until capacity is met
    • SJF - with starvation?
      • same as FCFS but as a whole?
      • Time to complete job = # of bytes requested / bandwidth (rank na lang?) [t = d / r]
      • preempt longest jobs
    • RR
      • do not accept if new job will go over the quota
      • proceed to next job
  • How do you determine distance?
    • Hops?
      • hmm, I think it should be between
      • peers, hops = 1 always
  • When do requests arrive
    • Same as above? (t = d/r?)
  • Cycle time = search time + process time + reply time + download/copy time
    • Search time = reply time
    • Process time
      • 1 if file seen?
      • ST + (summation of PT) + RT + DT until file seen or timeout
    • Download/copy time
      • reply time per byte * number of bytes
      • Download only happens when file is found (if not, it is passed on its neighbor, based on "hint"
      • Copy only happens when there is available space on the target
  • Random addition and subtraction of peers?
  • Random addition and subtraction of files?
  • If there is no peer with highest spare capacity, then distribute load to neighbors
  • Other parameters
    • # of searches
    • # of successful searches
    • # of hops before file is seen
    • list of tiles and size
    • time before file is seen
    • time before file is downloaded
    • queue for requests
    • timer for requests
    • replicate time
    • topology
    • reconstruction time
Other simulators to investigate:

0 Comments:

Post a Comment

<< Home