The Final Flight of the Harvster: February 2005

Problem: Loosely structured peer-to-peer systems (LSP2P) not scalable enough

Solution:

Distributed flow control
Topology construction algorithm
Best neighbor replication strategy

How?

Link of peer with highest outgoing rate (most number of requests) get redirected to the peer with the highest spare capacity (both peers are neighbors of the requested peer)
Requested data is replicated at the peer with the highest spare capacity (the best neighbor)

Approach:

Compare a LSP2P system (Anthill's Gnutant? Freenet?) with a modified one (Harvster?)
To determine search success rate: # of searches vs. # of successful searches
Average search depth: #hops vs. #searches (Wonder if I can track more than a hop)
Documents (bytes) copied

Tools:

Anthill - discontinued in favor of a more scalable and lightweight simulator called Peersim, developed under the BISON project. Heavily lacks documentation and tutorials!
Other simulation tools?

Strategy:

Populate peers using a simulation model
Analyze using random loads and get statistics (graph 'em perhaps)
Change protocols
Parameters

Outgoing load
Capacity
Neighbor + load (sum of outgoing load)
Neighbor load (total)
Neighbor capacity
Load priority (depends on priority strategy chosen)
Bandwidth
Distance

Concerns:

How do we determine load metrics?

# of files? - maybe not
# of bytes?

Bandwidth and distance (not sure if this is needed) not important because we are after scalability?
To simulate a real environment, we will use the bandwidth distribution used in Gnutella (research more on this)
How do you define capacities? Random?
How do you define priorities for preemptions?

FCFS

as a whole (total # of bytes) [maybe not] or individual (bytes)

a request can be any # of bytes so to avoid confusion, we'll use bytes
preempt last ones until capacity is met

SJF - with starvation?

same as FCFS but as a whole?
Time to complete job = # of bytes requested / bandwidth (rank na lang?) [t = d / r]
preempt longest jobs

do not accept if new job will go over the quota
proceed to next job

How do you determine distance?

Hops?

hmm, I think it should be between
peers, hops = 1 always

When do requests arrive

Same as above? (t = d/r?)

Cycle time = search time + process time + reply time + download/copy time

Search time = reply time
Process time

1 if file seen?
ST + (summation of PT) + RT + DT until file seen or timeout

Download/copy time

reply time per byte * number of bytes

Download only happens when file is found (if not, it is passed on its neighbor, based on "hint"
Copy only happens when there is available space on the target

Random addition and subtraction of peers?
Random addition and subtraction of files?
If there is no peer with highest spare capacity, then distribute load to neighbors
Other parameters

# of searches
# of successful searches
# of hops before file is seen
list of tiles and size
time before file is seen
time before file is downloaded
queue for requests
timer for requests
replicate time
topology
reconstruction time

Other simulators to investigate:

Serapis - Java, cluttered!
P2PSim - dead link?
Narses
PlanetSim
Freenet Simulator using Java - dead link
Generic P2P simulator - dead link
Hui Zhang's Freenet Simulator - looks ok but written in C :(
Andy Holvoet's P2P-sim - Java! Looks nice :)

Title: Improving Scalability in Loosely Structured Peer-to-Peer (P2P) Systems
Adviser: Dr. Cedric Angelo M. Festin

Abstract: The successes of early systems such as Napster and Gnutella have shown that peer-to-peer (P2P) can be as powerful as the client-server model in terms of user roles. Napster and Gnutella enable users to share files in an end-to-end manner. Aside from sharing files, P2P is used for instant messaging and distributed computing. Although P2P has been increasingly used, it has problems with scalability. Unstructured, structured, and loosely structured frameworks have been applied to P2P systems with varying success to improve scalability for file sharing in the Internet. We propose to use an approach in unstructured systems and apply it to loosely structured systems. This involves control and topology adaptation used in unstructured systems in controlling the arrival and departure of messages due to querying and redirecting the data from one node to another. The redirection will be enhanced with replication. Anthill, a Java- based testbed built for the study, design and analysis of P2P systems, will be used in this research. The behavior of the proposed system will be analyzed through Anthill's simulation tools.

My abstract can also be seen here. This piece of information was lucky enough to be awarded fourth place at the 1st Computer Science Graduate Research Abstract Competition. More information about the contest can be seen at the Asian Java Competency Program 2003 Accomplishment Report.

The Final Flight of the Harvster

Thursday, February 17, 2005

Thesis Summary

Saturday, February 12, 2005

Title and Abstract

P2P News

Previous Posts

Archives