Presentation on theme: "P2P Architecture Case Study: Gnutella Network"— Presentation transcript:
1 P2P Architecture Case Study: Gnutella Network
I am … and I’m going to talk about the Gnutella network – more specifically about the macroscopic characteristics of this large-scale, distributed system.Gnutella network is one of the many P2P systems that appeared recently that allow users to exchange files. It’s special because it is completely decentralized: (at least until recently) all nodes performed exactly the same tasks and take decisions based only on local information.Matei RîpeanuThe University of Chicago
2 Why analyze Gnutella network?
Unprecedented scaleup to 100k nodes, 100TB data, 10M files todaySelf-organizing networkStaggering growthmore than 50 times during first half of 2001Open architecture, simple and flexible protocolInteresting mix of social and technical issues
3 Overview Gnutella protocol Tools for exploring the network
Network growthStructural graph analysisIs Gnutella a power-law network?Generated (overhead) network trafficTraffic estimatesOverlay network topology mappingI’m going to briefly present the protocol and the tools developed to explore the network.We used those tools to track the network over a 7 months period: November 2000 – May We analyzed the data gathered and tried to explain network growth, prerformed structural analysis on the network topology grapy and discovered growth invariants and analyzed gnutella’s similarities with other large-scale systems. Finally we analyzed generated traffic and the match between …
4 Gnutella protocol overview
P2P file sharing application on top of an overlay networkNodes maintain open TCP connectionsMessages are broadcasted (flooded) or back-propagatedProtocol:Broadcast(Flooding)Back-propagatedNode to nodeMembershipPINGPONGQueryQUERYQUERY HITFile downloadGET, PUSH
5 Gnutella search mechanism
Steps:Node 2 initiates search for file A71A42635
6 Gnutella search mechanism
Steps:Node 2 initiates search for file ASends message to all neighbors7142A63A5
7 Gnutella search mechanism
Steps:Node 2 initiates search for file ASends message to all neighborsNeighbors forward message7142A63A5
8 Gnutella search mechanism
Steps:Node 2 initiates search for file ASends message to all neighborsNeighbors forward messageNodes that have file A initiate a reply message7142A63A:55A
9 Gnutella search mechanism
Steps:Node 2 initiates search for file ASends message to all neighborsNeighbors forward messageNodes that have file A initiate a reply messageQuery reply message is back-propagated7142A:7A:563A5A
10 Gnutella search mechanism
Steps:Node 2 initiates search for file ASends message to all neighborsNeighbors forward messageNodes that have file A initiate a reply messageQuery reply message is back-propagated714A:72A:5635
11 Gnutella search mechanism
Steps:Node 2 initiates search for file ASends message to all neighborsNeighbors forward messageNodes that have file A initiate a reply messageQuery reply message is back-propagatedFile downloaddownload A7142635
12 Tools for network exploration
Eavesdropper - insert modified nodes into the network to eavesdrop traffic.Crawler - connects to all active nodes and uses the membership protocol to discover graph topology.Client-server approach.Graph analysis toolshigh-volume offlinecomputations.
13 Network growth High user interest Better resources
Users tolerate high latency, low quality resultsBetter resourcesDSL and cable modem nodes grew from 24% to 41% over first 6 months. Today >50%.Although the protocol looks almost too simple, and although the failure to scale of the gnutella network has been predicted timje and again, the network managed to grow 100x in about a year (50x during the 6 month period we ran our crawler). … Graph explainations …This growth deserves some explanations:Open architecture / open-source environmentCompeting implementationsLower overhead network traffic, improved resource utilization, better structure
14 Growth invariants (1): avg. node connectivity
3.4 links per node on averageWith the data gathered over this 6 months we performed some structural analysis on the topology graph. A first interesting growth invariant was that the average number of links per node stayed constant.For the graph – each point is a network – on X axis the size of the network and on Y axis the total number of links.
15 Growth invariants (2): network diameter
Node-to-node distance maintains similar distributionAverage node-to-node distance grew 25% while the network grew 50 times over 6 monthsA more interesting invariant is related to the distribution and average values of node-to-node shortest paths for all the topology graphs we’ve obtained.In the figure each line relresents a graph … The darker ones represent earlier network measurements while the lighter one represent later network measurements. As you can see the distributions remain pretty stable … curves have the same shape … they only shift a bit right over time. And this shift is reflected in a 25% increase in average node to node shortest path all whlie the network grew 50%. Note that this is better than a random graph would do!
16 Is Gnutella a power-law network?
Power-law networks: the number of links per node follows a power-law distributionExamples:the Internet,in/out links to/from HTML pages,citation network,US power grid,social networks.November 2000An interesting analysis is generated by the question on whether GN is a power-law network?Implications: High tolerance to random node failure but low reliability when facing of an ‘intelligent’ adversary
17 Is Gnutella a power-law network?
Later, larger networks display a bimodal distributionImplications:High tolerance to random node failures preservedIncreased reliabilitywhen facing anattack.May 2001
18 Overview Gnutella protocol Network growth Structural graph analysis
Generated network traffic:Traffic estimatesDoes Gnutella overlay network topology match the underlying resources.
19 Traffic analysis 6-8 kbps per link over all connections
Traffic structure changed over time
20 Total generated traffic
1Gbps (or 330TB/month)!Compare to 15,000TB/month in US Internet backbone (Dec. 2000)Note that this estimate excludes actual file transfersQ: Does it matter?Reasoning:QUERY and PING messages are flooded. They form more than 90% of generated trafficpredominant TTL=7>95% of nodes are less than 7 hops awaymeasured traffic at each link about 6kbsnetwork with 50k nodes and 170k links
21 Topology mismatchThe overlay network topology doesn’t match the underlying Internet infrastructure topology!40% of all nodes are in the 10 largest Autonomous Systems (AS)Only 2-4% of all TCP connections link nodes within the same ASLargely ‘random wiring’Entropy experiment gives similar results
22 ConclusionsGnutella: self-organizing, large-scale, P2P application based on overlay network. It works!Growth hindered by the volume of generated traffic and inefficient resource use.Discovered growth invariants specific to large-scale systems that:Help predict resource usageGive hints for better search and resource organization techniques.Some solutions to help the network scale:Organize the overlay network to match the underlying infrastructure topology.Investigate methods for reducing traffic (query routing/filtering, better information organization).Exploit locality in user interest small world network (vorbit despre proiectul nostru de la Chicago)Exploit caches all while maintaining the self-organizing characteristics
23 Thank you!Questions?
24 What’s next?Organize the overlay network to match the underlying infrastructure topology.Investigate methods for reducing traffic (query routing/filtering, better information organization).Is Gnutella network a small-world network? What are the implications?CRED CA ASTA POATE SA DISPARA!
25 Statistical laws of large-scale systems
Zipf’s law:the size of the rth largest occurrence of the event is inversely proportional to it's rank: y ~ r -b, with b close to unity.Power law distributions:Probability distribution of event X is P[X=x]=x -kPareto distribution:Cumulative probability distribution P[X>x]=x –(k-1) =x –Zipf, Pareto and power-law distributions are basically different ways to express the same phenomenon
27 Overview Gnutella protocol Network growth
Statistical properties of large-scale systemsPower-law distributions.Power-law networks.Generated (overhead) network traffic.
28 Power-law distributions
Probability distribution of event X is P[X=x]=x –kPresent all over WWW and Internet space: the number of HTML pages within a site, visits to a site, links to a page, cache document popularity, etc
29 Power-law distributions in Gnutella
Number of shared files per nodeQuery popularity follows a power-law distribution [Kas01]Implications:Caching is an effective solution to reduce traffic and query latencyNew search and node organizing mechanisms!
Отпил глоток и чуть не поперхнулся. Ничего себе капелька. В голове у нее стучало.