idnits 2.17.1 draft-marocco-alto-problem-statement-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 424. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 435. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 442. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 448. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 10, 2008) is 5769 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Marocco 3 Internet-Draft Telecom Italia 4 Intended status: Informational V. Gurbani 5 Expires: January 11, 2009 Bell Laboratories, Alcatel-Lucent 6 July 10, 2008 8 Application-Layer Traffic Optimization (ALTO) Problem Statement 9 draft-marocco-alto-problem-statement-02 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on January 11, 2009. 36 Abstract 38 A significant part of the Internet traffic today is generated by 39 peer-to-peer applications used for file sharing, realtime 40 communications and live media streaming. Such applications often 41 deal with large amounts of data in direct peer-to-peer connections, 42 but they usually have little knowledge of the underlying topology, 43 both at the overlay layer and the network layer. As a result, they 44 may choose their peers based on measurements and statistics which, in 45 some specific situations, often lead to suboptimal choices. This 46 document describes problems related to optimizing traffic generated 47 by peer-to-peer applications through the use of link and network 48 layer information. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. Research or Engineering? . . . . . . . . . . . . . . . . . 4 54 2. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 4 55 2.1. Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 5 56 2.1.1. Caching . . . . . . . . . . . . . . . . . . . . . . . 5 57 2.1.2. Information Distribution . . . . . . . . . . . . . . . 5 58 2.1.3. Topology Hiding . . . . . . . . . . . . . . . . . . . 6 59 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 3.1. File sharing . . . . . . . . . . . . . . . . . . . . . . . 6 61 3.2. Cache/Mirror Selection . . . . . . . . . . . . . . . . . . 6 62 3.3. Live media streaming . . . . . . . . . . . . . . . . . . . 6 63 3.4. Realtime communications . . . . . . . . . . . . . . . . . 7 64 3.5. Distributed Hash Tables . . . . . . . . . . . . . . . . . 7 65 4. Security Considerations . . . . . . . . . . . . . . . . . . . 7 66 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8 67 6. Informative References . . . . . . . . . . . . . . . . . . . . 8 68 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 69 Intellectual Property and Copyright Statements . . . . . . . . . . 11 71 1. Introduction 73 A significant part of the Internet traffic is today generated by 74 peer-to-peer (P2P) applications used for file sharing, realtime 75 communications and live media streaming [WWW.cachelogic.picture] 76 [WWW.wired.fuel]. Contrary to client/server architectures, P2P 77 applications access resources (e.g. files or media relays) 78 distributed across the Internet and exchange large amounts of data in 79 connections that they establish directly with nodes hosting such 80 resources. 82 One of the advantages of P2P systems comes from the fact that the 83 resources they offer are often made available through multiple 84 instances. Yet, applications generally ignore the topology of the 85 latent overlay network and have to select among available instances 86 based on information they deduce from empirical measurements which, 87 in some particular situations, could lead to suboptimal choices. 89 For example, popular metrics based on round trip time estimation 90 sometimes used for initial sources selection (i.e. before actual 91 data transmissions begin, when goodput values are unknown) perform 92 quite badly for file sharing applications, as they tend to ignore 93 bandwidth and reliability of underlying links, which have much 94 more influence than delay on file transfers. 96 Many of the existing overlay networks are built on top of connections 97 between peers that are established regardless of the underlying 98 network topology. In addition to simply achieving suboptimal 99 performance, such networks can lead to congestions and cause serious 100 inefficiencies. As shown in [ACM.fear], traffic generated by popular 101 P2P applications often cross network boundaries multiple times, 102 overloading links which are frequently subject to congestion 103 [ACM.bottleneck]. 105 Recent studies [ACM.ispp2p] [WWW.p4p.overview] [ACM.ono] have shown 106 that if Internet Service Providers (ISP), network operators or third 107 parties in general provide reliable topology and/or bandwidth 108 information to P2P applications, it would be possible to greatly 109 increase application performance, reduce congestions and optimize the 110 overall traffic across different networks. 112 This document describes the problem of optimizing traffic generated 113 by P2P applications using information provided by third parties. 114 Section 2 introduces the problem and the main issues to keep in mind 115 when designing a solution, while Section 3 describes some use cases 116 where both P2P users and network operators would benefit from such a 117 solution. 119 1.1. Research or Engineering? 121 At the time of writing, several solutions have been proposed to 122 address the problem described in this document, both inside and 123 outside the IETF [I-D.bonaventure-informed-path-selection] 124 [ACM.ispp2p] [WWW.p4p.overview], all accompanied by encouraging 125 simulation and field test results. Such solutions have been proposed 126 independently, but all consists of two essential parts: 127 o a discovery mechanism which can be used by a P2P application to 128 find a reliable information source; 129 o a protocol used by P2P applications to query such sources in order 130 to retrieve the information needed to choose the best endpoint 131 among those which host a desired resource. 133 It is not easy to foresee how such solutions would perform in the 134 Internet, but a more accurate evaluation would require representative 135 data collected from real systems by a critical mass of users. 137 However, wide adoption will probably never happen without an 138 agreement on a common solution based on open standards; whether such 139 a solution should be still studied as a research problem, published 140 as a "Proposed Standard" or an "Experimental" RFC [RFC2026] is an 141 open issue. 143 2. The Problem 145 Network engineers have been facing the problem of traffic 146 optimization for a long time now and have already designed mechanisms 147 like MPLS [RFC3031] and DiffServ [RFC3260] to deal with it. The 148 problem they address consists in finding (or setting) an optimal 149 route for packets traveling between specific source and destination 150 addresses and based on requirements such as low latency, high 151 reliability, and priority. Such solutions are usually implemented at 152 the link and network layers, and tend to be almost transparent. At 153 best, applications can only "mark" the traffic they generate with the 154 corresponding properties. 156 However, P2P applications which are today posing serious challenges 157 to Internet infrastructures, do not benefit much from the above 158 techniques and "cooperating" with external services aware of the 159 network topology could greatly optimize the traffic they generate. 160 In fact, when a P2P application needs to establish a connection, the 161 logical target is not a host, but rather a resource (e.g. a file or a 162 media relay) generally available in multiple instances on different 163 hosts; selection of the closest one -- or, in general, the best from 164 an overlay topological proximity -- has much more impact on the 165 overall traffic than the route followed by its packets to reach the 166 endpoint. 168 Addressing the Application-Layer Traffic Optimization (ALTO) problem 169 means, on the one hand, providing topology information regarding the 170 underlying network and, on the other hand, enhancing P2P applications 171 in order to use such information to select the best endpoints among 172 those that are available for the connections they are going to 173 establish. 175 2.1. Issues 177 2.1.1. Caching 179 A common approach to optimize traffic generated by applications which 180 require large data transfers is based on caching techniques. In some 181 cases, such techniques have proven to be extremely effective in both 182 enhancing user experience and saving network resources; however, they 183 have two main limits in respect to the solutions based on provision 184 of topology information: 185 1. Application specificity: since a cache is meant to replace the 186 source of the content being accessed -- either explicitly or 187 transparently -- it must be able to speak the same protocol with 188 the querying peer. For this reason, caching solutions can be 189 reasonably adopted only for most popular applications (e.g. HTTP 190 and BitTorrent). 191 2. Content awareness: since caches need to actually store the 192 content being delivered, they are subject to legal threats 193 whenever the user has not the right to access such content. This 194 limitation makes caching approaches unusable in today's popular 195 file-sharing systems. 197 In general, solutions based on provision of topology information do 198 not interfere with caching; to the contrary, if the peer selection 199 service used by applications is aware of the presence of chaches, it 200 can give them higher priorities in its replies and thus achieve 201 greater optimization. 203 2.1.2. Information Distribution 205 As a direct consequence of the total distribution of the Internet, it 206 seems almost impossible to centralize all information P2P 207 applications may need to optimize traffic they generate. It is quite 208 likely that such information would be highly distributed, for 209 example, at an ISP or domain level. It is also reasonable to expect 210 that, in some cases, the same network administrators will control 211 provision of such information. 213 However, as applications usually have no knowledge of the 214 administrative entities running the network they are using, any 215 solution will need to define a discovery mechanism (e.g. based on or 216 similar to reverse DNS [RFC2317]) and perhaps an infrastructure to 217 certify information sources. 219 2.1.3. Topology Hiding 221 Operators can play an important role in addressing the ALTO problem, 222 but they generally consider topology of the networks they control to 223 be confidential information; therefore, in order to succeed and 224 achieve wide adoption, any solution should provide a method to help 225 P2P applications in peer selection without explicitly disclosing 226 topology of the underlying network. 228 3. Use Cases 230 3.1. File sharing 232 File sharing applications allow users to search for content shared by 233 other users and download it. Typically, search results consist of 234 many instances of the same file available from multiple sources; the 235 goal of an ALTO solution would be to help peers find the best ones 236 according to the underlying networks. 238 On the application side, integration of ALTO functionalities may 239 happen at different levels. For example, while in the completely 240 decentralized Gnutella network selection of the best sources is 241 totally up to the user, in systems like BitTorrent and eDonkey, 242 central elements (i.e. trackers or servers) act as mediators. 243 Therefore, in the former case, optimization would require 244 modification in the applications, while in the latter it could just 245 be implemented in some central elements. 247 3.2. Cache/Mirror Selection 249 Providers of popular content like media and software repositories 250 usually resort to geographically distributed caches and mirrors for 251 load balancing. Selection of the proper mirror/cache for a given 252 user is today based on inaccurate geolocation data, on proprietary 253 network location systems or often delegated to the user himself; an 254 ALTO solution could be easily adopted to ease such a selection in an 255 automated way. 257 3.3. Live media streaming 259 P2P applications for live streaming allow users to receive multimedia 260 content produced by one source and targeted to multiple destinations, 261 in a realtime or near-realtime way without recurring to multicast. 262 Such applications typically participate in the distribution of the 263 content, acting as both receivers and senders; the goal of an ALTO 264 solution would be to help peers to find the best sources and the best 265 destinations for media flows they receive and relay. 267 3.4. Realtime communications 269 P2P realtime communications allow users to establish direct media 270 flows, usually to place audio and video calls, or to have text chats. 271 In the basic case, media would flow directly between the two 272 endpoints; however, in the general case, a significant portion of 273 communications between users with limited access to the Internet 274 (e.g. users behind NATs, firewalls or HTTP proxies) need to be 275 relayed by other elements. Such media relays are distributed over 276 the Internet -- in some cases co-located with applications with a 277 public address; the goal of an ALTO solution would be to help peers 278 to find the best relays. 280 3.5. Distributed Hash Tables 282 Distributed hash tables (DHT) are a class of overlay algorithms used 283 to implement lookup functionalities in popular P2P systems, without 284 recurring to centralized elements. In such systems, peers maintain 285 addresses of other peers participating in the same DHT in a routing 286 table, sorted according to specific criteria. An ALTO solution would 287 provide valuable information for DHT algorithms which, in order to 288 reduce path latency of distributed queries, include round trip time 289 estimations among such criteria [SIGCOMM.resprox]. 291 4. Security Considerations 293 The approach proposed in this document requires P2P applications to 294 delegate a portion of their routing capability to third parties, 295 giving them a significant role in systems where that would be 296 otherwise excluded. 298 In the case where an ALTO solution is deployed by the network 299 operator, it is conceivable that the P2P community would consider it 300 hostile because the operator could, for example: 301 o redirect applications to corrupted mediators providing malicious 302 content; 303 o track connections to perform content inspection; 304 o apply policies based on criteria other than network efficiency 305 (for example, to avoid peering points regulated by inconvenient 306 economic agreements). 308 However, ALTO is completely optional for P2P applications and its 309 purpose is to help improve performance of such applications. If, for 310 some reason, it fails to achieve this purpose, it would simply fail 311 to gain popularity and would not be used. 313 Even in cases where the ALTO service provider would decide to 314 maliciously alter results returned by queries only after the solution 315 has gained popularity (i.e. it behaves for a while to become popular 316 and then starts misbehaving), it would be fairly easy for P2P 317 application maintainers and users to revert to solutions that are not 318 using it. After all, it would all come down to change some 319 application settings in cases where the protocol is implemented 320 inside the client and upgrading centralized elements for 321 architectures like BitTorrent and eDonkey. 323 5. Acknowledgments 325 We have to acknowledge many people. For the record: Vinay Aggarwal 326 and the P4P working group for the research work done outside the 327 IETF. Emil Ivov, Rohan Mahy, Anthony Bryan, Stanislav Shalunov, 328 Laird Popkin, Stefano Previdi, Reinaldo Penno, Dimitri Papadimitriou 329 and many others for interesting discussions, comments and 330 corrections. 332 6. Informative References 334 [ACM.bottleneck] 335 Akella, A., Seshan, S., and A. Shaikh, "An Empirical 336 Evaluation of WideArea Internet Bottlenecks", Proceedings 337 of ACM SIGCOMM, October 2003. 339 [ACM.fear] 340 Karagiannis, T., Rodriguez, P., and K. Papagiannaki, 341 "Should ISPs fear Peer-Assisted Content Distribution?", 342 In ACM USENIX IMC, Berkeley 2005. 344 [ACM.ispp2p] 345 Aggarwal, V., Feldmann, A., and C. Scheideler, "Can ISPs 346 and P2P systems co-operate for improved performance?", In 347 ACM SIGCOMM Computer Communications Review (CCR), 37:3, 348 pp. 29-40. 350 [ACM.ono] Choffnes, D. and F. Bustamante, "Taming the Torrent: A 351 practical approach to reducing cross-ISP traffic in P2P 352 systems", Proceedings of ACM SIGCOMM, August 2008. 354 [I-D.bonaventure-informed-path-selection] 355 Saucez, D. and B. Donnet, "The case for an informed path 356 selection service", 357 draft-bonaventure-informed-path-selection-00 (work in 358 progress), February 2008. 360 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 361 3", BCP 9, RFC 2026, October 1996. 363 [RFC2317] Eidnes, H., de Groot, G., and P. Vixie, "Classless IN- 364 ADDR.ARPA delegation", BCP 20, RFC 2317, March 1998. 366 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 367 Label Switching Architecture", RFC 3031, January 2001. 369 [RFC3260] Grossman, D., "New Terminology and Clarifications for 370 Diffserv", RFC 3260, April 2002. 372 [SIGCOMM.resprox] 373 Gummadi, K., Gummadi, R., Ratnasamy, S., Gribble, S., 374 Shenker, S., and I. Stoica, "The impact of DHT routing 375 geometry on resilience and proximity", Proceedings of ACM 376 SIGCOMM, August 2003. 378 [WWW.cachelogic.picture] 379 Parker, A., "The true picture of peer-to-peer 380 filesharing", . 382 [WWW.p4p.overview] 383 Xie, H., Krishnamurthy, A., Silberschatz, A., and R. Yang, 384 "P4P: Explicit Communications for Cooperative Control 385 Between P2P and Network Providers", 386 . 388 [WWW.wired.fuel] 389 Glasner, J., "P2P fuels global bandwidth binge", 390 . 392 Authors' Addresses 394 Enrico Marocco 395 Telecom Italia 396 Via G. Reiss Romoli, 274 397 Turin 10148 398 Italy 400 Email: enrico.marocco@telecomitalia.it 402 Vijay K. Gurbani 403 Bell Laboratories, Alcatel-Lucent 404 2701 Lucent Lane 405 Lisle, IL 60532 406 USA 408 Email: vkg@alcatel-lucent.com 410 Full Copyright Statement 412 Copyright (C) The IETF Trust (2008). 414 This document is subject to the rights, licenses and restrictions 415 contained in BCP 78, and except as set forth therein, the authors 416 retain all their rights. 418 This document and the information contained herein are provided on an 419 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 420 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 421 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 422 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 423 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 424 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 426 Intellectual Property 428 The IETF takes no position regarding the validity or scope of any 429 Intellectual Property Rights or other rights that might be claimed to 430 pertain to the implementation or use of the technology described in 431 this document or the extent to which any license under such rights 432 might or might not be available; nor does it represent that it has 433 made any independent effort to identify any such rights. Information 434 on the procedures with respect to rights in RFC documents can be 435 found in BCP 78 and BCP 79. 437 Copies of IPR disclosures made to the IETF Secretariat and any 438 assurances of licenses to be made available, or the result of an 439 attempt made to obtain a general license or permission for the use of 440 such proprietary rights by implementers or users of this 441 specification can be obtained from the IETF on-line IPR repository at 442 http://www.ietf.org/ipr. 444 The IETF invites any interested party to bring to its attention any 445 copyrights, patents or patent applications, or other proprietary 446 rights that may cover technology that may be required to implement 447 this standard. Please address the information to the IETF at 448 ietf-ipr@ietf.org.