Network Working Group E. Marocco Internet-Draft Telecom Italia Intended status: Informational V. Gurbani Expires: January 11, 2009 Bell Laboratories, Alcatel-Lucent July 10, 2008 Application-Layer Traffic Optimization (ALTO) Problem Statement draft-marocco-alto-problem-statement-02 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 11, 2009. Abstract A significant part of the Internet traffic today is generated by peer-to-peer applications used for file sharing, realtime communications and live media streaming. Such applications often deal with large amounts of data in direct peer-to-peer connections, but they usually have little knowledge of the underlying topology, both at the overlay layer and the network layer. As a result, they may choose their peers based on measurements and statistics which, in some specific situations, often lead to suboptimal choices. This document describes problems related to optimizing traffic generated by peer-to-peer applications through the use of link and network layer information. Marocco & Gurbani Expires January 11, 2009 [Page 1] Internet-Draft ALTO Problem Statement July 2008 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Research or Engineering? . . . . . . . . . . . . . . . . . 4 2. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1. Caching . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2. Information Distribution . . . . . . . . . . . . . . . 5 2.1.3. Topology Hiding . . . . . . . . . . . . . . . . . . . 6 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. File sharing . . . . . . . . . . . . . . . . . . . . . . . 6 3.2. Cache/Mirror Selection . . . . . . . . . . . . . . . . . . 6 3.3. Live media streaming . . . . . . . . . . . . . . . . . . . 6 3.4. Realtime communications . . . . . . . . . . . . . . . . . 7 3.5. Distributed Hash Tables . . . . . . . . . . . . . . . . . 7 4. Security Considerations . . . . . . . . . . . . . . . . . . . 7 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8 6. Informative References . . . . . . . . . . . . . . . . . . . . 8 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 Intellectual Property and Copyright Statements . . . . . . . . . . 11 Marocco & Gurbani Expires January 11, 2009 [Page 2] Internet-Draft ALTO Problem Statement July 2008 1. Introduction A significant part of the Internet traffic is today generated by peer-to-peer (P2P) applications used for file sharing, realtime communications and live media streaming [WWW.cachelogic.picture] [WWW.wired.fuel]. Contrary to client/server architectures, P2P applications access resources (e.g. files or media relays) distributed across the Internet and exchange large amounts of data in connections that they establish directly with nodes hosting such resources. One of the advantages of P2P systems comes from the fact that the resources they offer are often made available through multiple instances. Yet, applications generally ignore the topology of the latent overlay network and have to select among available instances based on information they deduce from empirical measurements which, in some particular situations, could lead to suboptimal choices. For example, popular metrics based on round trip time estimation sometimes used for initial sources selection (i.e. before actual data transmissions begin, when goodput values are unknown) perform quite badly for file sharing applications, as they tend to ignore bandwidth and reliability of underlying links, which have much more influence than delay on file transfers. Many of the existing overlay networks are built on top of connections between peers that are established regardless of the underlying network topology. In addition to simply achieving suboptimal performance, such networks can lead to congestions and cause serious inefficiencies. As shown in [ACM.fear], traffic generated by popular P2P applications often cross network boundaries multiple times, overloading links which are frequently subject to congestion [ACM.bottleneck]. Recent studies [ACM.ispp2p] [WWW.p4p.overview] [ACM.ono] have shown that if Internet Service Providers (ISP), network operators or third parties in general provide reliable topology and/or bandwidth information to P2P applications, it would be possible to greatly increase application performance, reduce congestions and optimize the overall traffic across different networks. This document describes the problem of optimizing traffic generated by P2P applications using information provided by third parties. Section 2 introduces the problem and the main issues to keep in mind when designing a solution, while Section 3 describes some use cases where both P2P users and network operators would benefit from such a solution. Marocco & Gurbani Expires January 11, 2009 [Page 3] Internet-Draft ALTO Problem Statement July 2008 1.1. Research or Engineering? At the time of writing, several solutions have been proposed to address the problem described in this document, both inside and outside the IETF [I-D.bonaventure-informed-path-selection] [ACM.ispp2p] [WWW.p4p.overview], all accompanied by encouraging simulation and field test results. Such solutions have been proposed independently, but all consists of two essential parts: o a discovery mechanism which can be used by a P2P application to find a reliable information source; o a protocol used by P2P applications to query such sources in order to retrieve the information needed to choose the best endpoint among those which host a desired resource. It is not easy to foresee how such solutions would perform in the Internet, but a more accurate evaluation would require representative data collected from real systems by a critical mass of users. However, wide adoption will probably never happen without an agreement on a common solution based on open standards; whether such a solution should be still studied as a research problem, published as a "Proposed Standard" or an "Experimental" RFC [RFC2026] is an open issue. 2. The Problem Network engineers have been facing the problem of traffic optimization for a long time now and have already designed mechanisms like MPLS [RFC3031] and DiffServ [RFC3260] to deal with it. The problem they address consists in finding (or setting) an optimal route for packets traveling between specific source and destination addresses and based on requirements such as low latency, high reliability, and priority. Such solutions are usually implemented at the link and network layers, and tend to be almost transparent. At best, applications can only "mark" the traffic they generate with the corresponding properties. However, P2P applications which are today posing serious challenges to Internet infrastructures, do not benefit much from the above techniques and "cooperating" with external services aware of the network topology could greatly optimize the traffic they generate. In fact, when a P2P application needs to establish a connection, the logical target is not a host, but rather a resource (e.g. a file or a media relay) generally available in multiple instances on different hosts; selection of the closest one -- or, in general, the best from an overlay topological proximity -- has much more impact on the overall traffic than the route followed by its packets to reach the Marocco & Gurbani Expires January 11, 2009 [Page 4] Internet-Draft ALTO Problem Statement July 2008 endpoint. Addressing the Application-Layer Traffic Optimization (ALTO) problem means, on the one hand, providing topology information regarding the underlying network and, on the other hand, enhancing P2P applications in order to use such information to select the best endpoints among those that are available for the connections they are going to establish. 2.1. Issues 2.1.1. Caching A common approach to optimize traffic generated by applications which require large data transfers is based on caching techniques. In some cases, such techniques have proven to be extremely effective in both enhancing user experience and saving network resources; however, they have two main limits in respect to the solutions based on provision of topology information: 1. Application specificity: since a cache is meant to replace the source of the content being accessed -- either explicitly or transparently -- it must be able to speak the same protocol with the querying peer. For this reason, caching solutions can be reasonably adopted only for most popular applications (e.g. HTTP and BitTorrent). 2. Content awareness: since caches need to actually store the content being delivered, they are subject to legal threats whenever the user has not the right to access such content. This limitation makes caching approaches unusable in today's popular file-sharing systems. In general, solutions based on provision of topology information do not interfere with caching; to the contrary, if the peer selection service used by applications is aware of the presence of chaches, it can give them higher priorities in its replies and thus achieve greater optimization. 2.1.2. Information Distribution As a direct consequence of the total distribution of the Internet, it seems almost impossible to centralize all information P2P applications may need to optimize traffic they generate. It is quite likely that such information would be highly distributed, for example, at an ISP or domain level. It is also reasonable to expect that, in some cases, the same network administrators will control provision of such information. However, as applications usually have no knowledge of the Marocco & Gurbani Expires January 11, 2009 [Page 5] Internet-Draft ALTO Problem Statement July 2008 administrative entities running the network they are using, any solution will need to define a discovery mechanism (e.g. based on or similar to reverse DNS [RFC2317]) and perhaps an infrastructure to certify information sources. 2.1.3. Topology Hiding Operators can play an important role in addressing the ALTO problem, but they generally consider topology of the networks they control to be confidential information; therefore, in order to succeed and achieve wide adoption, any solution should provide a method to help P2P applications in peer selection without explicitly disclosing topology of the underlying network. 3. Use Cases 3.1. File sharing File sharing applications allow users to search for content shared by other users and download it. Typically, search results consist of many instances of the same file available from multiple sources; the goal of an ALTO solution would be to help peers find the best ones according to the underlying networks. On the application side, integration of ALTO functionalities may happen at different levels. For example, while in the completely decentralized Gnutella network selection of the best sources is totally up to the user, in systems like BitTorrent and eDonkey, central elements (i.e. trackers or servers) act as mediators. Therefore, in the former case, optimization would require modification in the applications, while in the latter it could just be implemented in some central elements. 3.2. Cache/Mirror Selection Providers of popular content like media and software repositories usually resort to geographically distributed caches and mirrors for load balancing. Selection of the proper mirror/cache for a given user is today based on inaccurate geolocation data, on proprietary network location systems or often delegated to the user himself; an ALTO solution could be easily adopted to ease such a selection in an automated way. 3.3. Live media streaming P2P applications for live streaming allow users to receive multimedia content produced by one source and targeted to multiple destinations, Marocco & Gurbani Expires January 11, 2009 [Page 6] Internet-Draft ALTO Problem Statement July 2008 in a realtime or near-realtime way without recurring to multicast. Such applications typically participate in the distribution of the content, acting as both receivers and senders; the goal of an ALTO solution would be to help peers to find the best sources and the best destinations for media flows they receive and relay. 3.4. Realtime communications P2P realtime communications allow users to establish direct media flows, usually to place audio and video calls, or to have text chats. In the basic case, media would flow directly between the two endpoints; however, in the general case, a significant portion of communications between users with limited access to the Internet (e.g. users behind NATs, firewalls or HTTP proxies) need to be relayed by other elements. Such media relays are distributed over the Internet -- in some cases co-located with applications with a public address; the goal of an ALTO solution would be to help peers to find the best relays. 3.5. Distributed Hash Tables Distributed hash tables (DHT) are a class of overlay algorithms used to implement lookup functionalities in popular P2P systems, without recurring to centralized elements. In such systems, peers maintain addresses of other peers participating in the same DHT in a routing table, sorted according to specific criteria. An ALTO solution would provide valuable information for DHT algorithms which, in order to reduce path latency of distributed queries, include round trip time estimations among such criteria [SIGCOMM.resprox]. 4. Security Considerations The approach proposed in this document requires P2P applications to delegate a portion of their routing capability to third parties, giving them a significant role in systems where that would be otherwise excluded. In the case where an ALTO solution is deployed by the network operator, it is conceivable that the P2P community would consider it hostile because the operator could, for example: o redirect applications to corrupted mediators providing malicious content; o track connections to perform content inspection; o apply policies based on criteria other than network efficiency (for example, to avoid peering points regulated by inconvenient economic agreements). Marocco & Gurbani Expires January 11, 2009 [Page 7] Internet-Draft ALTO Problem Statement July 2008 However, ALTO is completely optional for P2P applications and its purpose is to help improve performance of such applications. If, for some reason, it fails to achieve this purpose, it would simply fail to gain popularity and would not be used. Even in cases where the ALTO service provider would decide to maliciously alter results returned by queries only after the solution has gained popularity (i.e. it behaves for a while to become popular and then starts misbehaving), it would be fairly easy for P2P application maintainers and users to revert to solutions that are not using it. After all, it would all come down to change some application settings in cases where the protocol is implemented inside the client and upgrading centralized elements for architectures like BitTorrent and eDonkey. 5. Acknowledgments We have to acknowledge many people. For the record: Vinay Aggarwal and the P4P working group for the research work done outside the IETF. Emil Ivov, Rohan Mahy, Anthony Bryan, Stanislav Shalunov, Laird Popkin, Stefano Previdi, Reinaldo Penno, Dimitri Papadimitriou and many others for interesting discussions, comments and corrections. 6. Informative References [ACM.bottleneck] Akella, A., Seshan, S., and A. Shaikh, "An Empirical Evaluation of WideArea Internet Bottlenecks", Proceedings of ACM SIGCOMM, October 2003. [ACM.fear] Karagiannis, T., Rodriguez, P., and K. Papagiannaki, "Should ISPs fear Peer-Assisted Content Distribution?", In ACM USENIX IMC, Berkeley 2005. [ACM.ispp2p] Aggarwal, V., Feldmann, A., and C. Scheideler, "Can ISPs and P2P systems co-operate for improved performance?", In ACM SIGCOMM Computer Communications Review (CCR), 37:3, pp. 29-40. [ACM.ono] Choffnes, D. and F. Bustamante, "Taming the Torrent: A practical approach to reducing cross-ISP traffic in P2P systems", Proceedings of ACM SIGCOMM, August 2008. Marocco & Gurbani Expires January 11, 2009 [Page 8] Internet-Draft ALTO Problem Statement July 2008 [I-D.bonaventure-informed-path-selection] Saucez, D. and B. Donnet, "The case for an informed path selection service", draft-bonaventure-informed-path-selection-00 (work in progress), February 2008. [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [RFC2317] Eidnes, H., de Groot, G., and P. Vixie, "Classless IN- ADDR.ARPA delegation", BCP 20, RFC 2317, March 1998. [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol Label Switching Architecture", RFC 3031, January 2001. [RFC3260] Grossman, D., "New Terminology and Clarifications for Diffserv", RFC 3260, April 2002. [SIGCOMM.resprox] Gummadi, K., Gummadi, R., Ratnasamy, S., Gribble, S., Shenker, S., and I. Stoica, "The impact of DHT routing geometry on resilience and proximity", Proceedings of ACM SIGCOMM, August 2003. [WWW.cachelogic.picture] Parker, A., "The true picture of peer-to-peer filesharing", . [WWW.p4p.overview] Xie, H., Krishnamurthy, A., Silberschatz, A., and R. Yang, "P4P: Explicit Communications for Cooperative Control Between P2P and Network Providers", . [WWW.wired.fuel] Glasner, J., "P2P fuels global bandwidth binge", . Marocco & Gurbani Expires January 11, 2009 [Page 9] Internet-Draft ALTO Problem Statement July 2008 Authors' Addresses Enrico Marocco Telecom Italia Via G. Reiss Romoli, 274 Turin 10148 Italy Email: enrico.marocco@telecomitalia.it Vijay K. Gurbani Bell Laboratories, Alcatel-Lucent 2701 Lucent Lane Lisle, IL 60532 USA Email: vkg@alcatel-lucent.com Marocco & Gurbani Expires January 11, 2009 [Page 10] Internet-Draft ALTO Problem Statement July 2008 Full Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Marocco & Gurbani Expires January 11, 2009 [Page 11]