[rrg] Recommendation suggestion from RW (v2)
Robin Whittle <rw@firstpr.com.au> Mon, 08 March 2010 19:19 UTC
Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 52DAA3A6B47 for <rrg@core3.amsl.com>; Mon, 8 Mar 2010 11:19:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.712
X-Spam-Level:
X-Spam-Status: No, score=-1.712 tagged_above=-999 required=5 tests=[AWL=-0.132, BAYES_00=-2.599, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pYLhxsZTPUij for <rrg@core3.amsl.com>; Mon, 8 Mar 2010 11:19:41 -0800 (PST)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id A2B193A6B3C for <rrg@irtf.org>; Mon, 8 Mar 2010 11:19:39 -0800 (PST)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id 227EC17573C; Tue, 9 Mar 2010 06:19:43 +1100 (EST)
Message-ID: <4B954DCF.30600@firstpr.com.au>
Date: Tue, 09 Mar 2010 06:19:43 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: [rrg] Recommendation suggestion from RW (v2)
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Mar 2010 19:19:46 -0000
This is an updated version of msg06162 to use terminology compatible with DRTM. There's no difference in technical meaning, but I updated the first few admin paragraphs. This text is compatible with the Ivip drafts: The Ivip IDs are: http://tools.ietf.org/html/draft-whittle-ivip-arch 04 http://tools.ietf.org/html/draft-whittle-ivip-drtm 01 http://tools.ietf.org/html/draft-whittle-ivip-glossary 01 http://tools.ietf.org/html/draft-whittle-ivip-etr-addr-forw 00 - Robin Further to my suggestion (msg06161) that people write the Recommendation they wish everyone would agree with, here is some text for the currently empty Recommendation section of: http://tools.ietf.org/html/draft-irtf-rrg-recommendation-06 When I wrote this, I wrongly assumed the RRG participants were to collaborate in some way on writing the Recommendation - and that it had to be done by 2010-03-08. Please see recent messages from Tony Li and me in the "Recommendation and what happens next", especially Tony's msg06192 and my msg06204 and whatever follows regarding how the Recommendation will be written. I would be very happy if everyone agrees to this. However, I know some or many people won't agree with it. I hope to prompt some responses, with reasons for disagreement, such as by detailed arguments regarding: 1 - Why proposals other than Ivip would be the best choice. (Or if someone wants to argue why Ivip is the best choice.) 2 - Why (in all CEE = Locator/ID Separation architectures) it is desirable for all hosts to take on extra routing and addressing responsibilities in order to avoid the additions to the routing system which are required in any CES architecture. These additional responsibilities, involve move complexity, packets and traffic for all hosts, including those on potentially slow and flaky wireless links. The typical necessity of performing two (or sometimes one) extra Identifier -> Locator mapping lookup will frequently significantly delay the establishment of communications, since all CEE proposals involve a global mapping lookup system, which is inherently prone to being slow and unreliable. 3 - Why a CEE architecture is superior to any CES architecture for IPv6. 4 - What to do about the IPv4 scaling problem if the recommendation is purely for a CEE architecture for IPv6. Also, I hope to prompt other people into writing their own Recommendation text - the text they wish everyone else would agree with. No apologies for length. We are advising on how to proceed with a once in several decades upgrade to IPv4 - the biggest and most widely adopted IT system on the planet - and its heir-apparent, IPv6. This report is the product of an effort which began in 2006 with the RAWS workshop in Amsterdam. 17 - Recommendation ------------------- 17.1 - Summary -------------- We believe that IETF development should focus on one or more proposals along the lines of Ivip or LISP - both of which are Core-Edge Separation (CES) architectures which are in principle suitable for solving the routing scalability problems of IPv4 and IPv6, while also supporting a global approach to mobility for both IP protocols. In section 17.6 we list the features of such architectures which are most desirable, and our reasons for doing so. Both LISP and Ivip are at an early stage of development. The exact form and names of the future enhancements for IPv4 and IPv6 are less important than the architectural elements they contain - and how these elements fit together to provide synergies, architectural elegance and suitably high levels of performance, robustness and security. However, we believe that the term "LISP" - being an acronym for "Locator / Identifier Separation" - is not appropriate to a CES architecture, since only Core-Edge Elimination (CEE) architectures implement "Locator / Identifier Separation". Loc/ID Separation is a fundamentally different naming model to that used in IPv4 and IPv6. While we do not wish to stymie the development of CEE architectures, we believe no such architecture is suitable for solving the practical problems facing both IPv4 and IPv6. CEE architectures are not applicable to IPv4 and could only provide scalable routing benefits to IPv6 once they were adopted by all hosts. A more fundamental objection to CEE architectures and their "Locator / Identifier Separation" naming model is that the extra burdens this model places on all hosts cannot be justified, either in terms of greater host control over the paths taken by their packets, or in terms of avoiding the greater complexity in the routing system which is inherent in all CES proposals. CES and CEE proposals involve the creation of a new mapping system (or in some cases, for CEE, a greater reliance on DNS). CES architectures require a mapping system by which an "edge" address can be looked up to return information which enables the selection of a particular "core" address to which the traffic packet will be tunneled. CEE architectures require a system by which a query can be sent with an Identifier, to return one or more Locators. Thus, CEE architectures are not fundamentally simpler in this regard than CES architectures. When two CEE hosts begin a communication, there is typically the need for each host to perform an ID->Loc lookup before the initial two-packet exchange can be completed. While there is typically a similar requirement for mapping lookups with a CES architecture, the CEE arrangements are more burdensome in general, and the CES lookups do not involve the hosts. A more detailed explanation of our concerns about the extra delays, packets and host responsibilities which are inherent in CEE architectures appears in section 17.4. While we have made a clear decision supporting architectural elements of two particular CES proposals, we recognise the contribution made to scalable routing by the proponents of all the other proposals. Scalable Routing is a decades old problem for Internet communication and wide ranging discussions and a diversity of approaches have been essential in choosing the best path, and prompting improvements to the most promising architectures. 17.2 Goals ---------- 17.2.1 Statements of problems and goals --------------------------------------- The RRG did not reach consensus on the nature of the scalable routing problem, or on the goals for a solution. Our choices have been based on a broad common understanding about the nature of the scaling problem, and our belief that any future architectural enhancement for solving the currently evident scaling problem should also support, or at least not impede, arrangements which will support global mobility for IPv4 and IPv6 devices - which may soon number in the billions. Two documents were prepared concerning scalable routing. Firstly, draft-irtf-rrg-design-goals-01 which was announced on 2007-07-11 on the basis of some early discussions. While some changes to it were suggested on 2007-07-14, these were neither acknowledged or debated. The draft has not been revised or widely discussed. There has been been no test of consensus support for it. Secondly, [I-D.narten-radir-problem-statement] is an attempt at documenting the scalable routing problem. While it has been updated somewhat since its inception in July 2007 and occasionally discussed in the RRG, in March 2010, it was still being discussed and there was no test of consensus support for it. As part of these discussions, on 2010-03-02, Geoff Huston (msg06152 and http://www.potaroo.net/presentations/2010-02-01-bgp2009.pdf) provided recent analysis of his ongoing measurements of the BGP control plane. Geoff's work in this field has been vital to our understanding of scalable routing and this analysis is an important document. In 2009 an attempt was made to list the constraints imposed on any scalable routing solution by the need for widespread voluntary adoption [http://www.firstpr.com.au/ip/ivip/RRG-2009/constraints/]. This was improved after some list discussions. This list has not received consensus support either, but no detailed critiques of it have appeared. This Recommendation is based on an understanding of the need for enhancements to IPv4 and IPv6 for reasons summarised in the following sub-sections. While some have argued that the IPv4 routing scalability problem does not need to be fixed due to the imminent widespread adoption of IPv6 - in turn due to the looming address space shortage in IPv4 - a larger body of opinion is that the IPv4 routing scaling problem is important and more urgently in need of a solution than IPv6's. Several people have expressed the view that the shortage of fresh IPv4 address space is likely to drive a higher rate of growth in IPv4 DFZ prefixes, as space is more finely divided in order to maximise utilization. No convincing arguments have been advanced for why IPv4 will not be very widely used for the foreseeable future. Currently advertised space amounts to about 130 /8s, which is around 60% of the global unicast space which could be advertised and used by hosts. Significant adoption of IPv6 seems likely at some stage - such as due to mass-adoption mobile devices being given IPv6 global unicast space, rather than, or in addition to, IPv4 access behind NAT. 17.2.3 Portability, Multihoming and inbound Traffic Engineering (TE) -------------------------------------------------------------------- Many discussions of scalable routing identify the problem faced by a PA-using end-user network when choosing another ISP: the need for renumbering all hosts and routers in their network. The severity of this problem scales approximately with the size of the network and the number of hosts it contains. However, it is greatly compounded by addresses for the network's hosts appearing in configuration files, ACLs (Access Control Lists) and other places outside the network itself. As an example, a university's range of IP addresses may appear in the ACLs of dozens or hundreds of academic journal sites, with each such ACL needing to be updated if the university needed to renumber its network A commercial example of the difficulties caused by renumbering is an end-user network which provide services for its customers, with those customers' DNSes containing IP addresses of this network's hosts. Renumbering would involve costly, error-prone and carefully timed changes to the DNSes of potentially numerous other organisations beyond the network itself. IPv6's arrangement for automatic renumbering of networks and their hosts are not an adequate solution for the "pain of renumbering" problem. The only solution is portability of the address space between ISPs. This ability to use one set of addresses via multiple ISPs is also a key requirement for multihoming - unless all hosts and their protocols are altered, as is the case with a CEE "Locator / Identifier Separation" architecture. Consequently we encapsulate the entire "renumbering" problem as the need for "portability" - the only solution which meets the needs of end-user networks. There is universal agreement that the scalable routing problem can only be solved by the scalable provision of at least three benefits to a much larger number of end-user networks than are currently able to access these benefits: Portability, Multihoming - and in the case of multihomed end-user networks: the potential for inbound TE. Inbound TE involves the steering of traffic between two or more ISPs for purposes such as load balancing, optimising costs, ensuring low latency for latency sensitive traffic etc. In brief, the two elements of the (non-mobile) routing scaling problem are: 1 - The burden on all DFZ routers and the DFZ control plane in general due to the increasing numbers of end-user networks advertising their PI prefixes - which is currently the only means by which they can achieve portability, multihoming and/or inbound traffic engineering (TE). This burden involves unfair distribution of costs - directly to all networks which run DFZ routers, and thereby to all Internet users, since this includes all ISPs. It also involves problems of technical capacity of DFZ routers to perform their task - particularly those with many neighbours, since the RIB burden scales approximately with the number of prefixes multiplied by the number of neighbours. This burden raises important concerns about the ability of the entire BGP-based DFZ control plane to respond rapidly and appropriately to outages. The sheer number of prefixes affected by a single outage results in slower then optimal processing by some or many DFZ routers - which has the potential to lead to patterns of convergence which are even less optimal than those observed at present. While most focus is on the RIB burden, it is also important to limit the growth in the number of prefixes routers' FIBs must handle. This should occur as a natural consequence of limiting or reversing the growth in the number of DFZ-advertised prefixes. 2 - The high cost and administrative barriers faced by end-user networks in achieving portability, multihoming and/or inbound TE. The number of non-mobile end-user networks which require these benefits is probably of order 10^7 - indicating that only a tiny fraction of such networks are currently able to achieve these benefits using the only approach currently available: PI prefixes advertised in the DFZ. Consequently, a good solution to the problem would enable many more end-user networks to achieve portability, multihoming and inbound TE with far less impact on the DFZ control plane. Ideally, the new arrangements would be attractive for current PI-using end-user networks - so enabling some, many or perhaps all of them to adopt the new scalable approach, and so no longer advertise each of their prefixes in the DFZ. 17.2.3 - IPv4 Address Exhaustion -------------------------------- The best known problem the Internet faces is the exhaustion of fresh supplies of IPv4 space. Fortunately, we believe that without any extra elaboration, CES architectures Ivip and LISP will both allow significant improvement in the utilization efficiency of IPv4 space, without excessively burdening the BGP control plane. Improved utilization efficiency is not a complete solution to the problem, but it is vital that these improvements be possible in a scalable fashion. Otherwise there will be a massive increase in the number of DFZ prefixes as IPv4 space is sliced more and more finely in order to actively use more of the global unicast address space. 17.2.4 - Global Mobility ------------------------ In some RRG discussions and proposals, mobility has been characterized as something which is already catered for adequately by existing Mobile IP (MIP) techniques - and which should not be the concern of the interdomain routing system or any enhancements to it. A contrary view is that MIP techniques are inadequate for the challenges of mass-adoption Mobility, and that a new approach - TTR Mobility - (Translating Tunnel Router Mobility - http://www.firstpr.com.au/ip/ivip/TTR-Mobility.pdf) provides benefits beyond those obtainable via current or likely future MIP techniques. While Mobility is not discussed in the above-mentioned documents concerning problems and goals, it does have a prominent place in the RRG's Charter: At the moment, the Internet routing and addressing architecture is facing challenges in scalability, mobility, multi-homing, and inter-domain traffic engineering. Thus the RRG proposes to focus its effort on designing an alternate architecture to meet these challenges. A looming problem is how to accommodate billions of mobile hand-held devices with IPv4 or IPv6 capabilities, including with session survival as the device connects to new access networks. An upper limit on the number of these devices is of order 10^10. For these to be directly reachable by existing hosts, this involves the device having at least one portable global unicast address and being able to exercise the equivalent of multihoming on a dynamic, second-to-second, basis - including the use of access networks with which the owner of the mobile device may have no prior relationship or knowledge. Each such device would ideally have at least one global unicast address which it retains no matter which access network(s) it is connected to, which address(es) it is given on those networks, whether or not those connections are behind one or more layers of NAT, and whether or not their address is conventional or of the new kind of "edge" address provided by CES architectures. Conventional Mobile IP approaches appear to be inadequate to these tasks, since they involve mobile and sometimes correspondent hosts in new protocols and management exchanges - if path lengths are generally to be optimal. MIP also burdens the mobile host with significant responsibilities when it changes its address - with further problems if two communicating mobile hosts both change addresses at about the same time. We believe the TTR Mobility architecture is suitable for providing mass-scale global mobility for both IPv4 and IPv6. TTR Mobility is an extension of the CES architecture, and we believe that architectures such as Ivip or LISP will support TTR Mobility well - without requiring any architectural complexity to do so. Optimal TTR Mobility support involves real-time mapping distribution - as should be possible on a scalable basis with Ivip's Distributed Real Time Mapping (DRTM) system [draft-whittle-ivip-drtm] - which may also be applicable to LISP. 17.2.5 Security and robustness ------------------------------ The Internet faces other significant problems, including vulnerabilities to Distributed Denial of Service (DDoS) attacks - and general problems with the security of hosts against a variety of forms of attack. While we cannot foresee how the proposed CES architectures can help significantly with these problems, we believe that it should be possible to devise a CES-based solution which does not significantly compromise the security, privacy or robustness of today's IPv4 or IPv6 communications 17.3 - Categorizing the proposals --------------------------------- 17.3.1 - Mapping only ------------------- We believe that the following proposals concern mapping systems only, and so do not constitute complete scalable routing proposals: Compact routing in locator identifier mapping system Layered mapping system (LMS) 2-phased mapping Enhanced Efficiency of Mapping Distribution Protocols in Map-and-Encap Schemes 17.3.2 - Neither CEE nor CES ---------------------------- Several other proposals do not appear to provide solutions which adequately address the numbers of end-user networks which need Portability, Multihoming and inbound TE - which is of order 10^7. Nor do they adequately address the need for mass-adoption global mobility. The critiques for these proposals mention these shortcomings. hIPv4 Name overlay (NOL) Evolution - Aggregation with Increasing Scopes 17.3.3 - CEE and CES proposals ------------------------------ We find the remaining proposals, all of which - in principle at least - may be capable of solving the routing scaling problem, to fall into two categories: Core-Edge Elimination (CEE) architectures and Core-Edge Separation (CES) architectures - the defining aspects of which are described in sections below. The four CEE proposals are: GLI-Split ILNP - Identifier-Locator Network Protocol Name-Based Sockets RANGI The four CES proposals are: IRON-RANGER Ivip LISP TIDR The widespread use of the terms Core-Edge Elimination and Core-Edge Separation began with this paper: Towards a Future Internet Architecture: Arguments for Separating Edges from Transit Core Dan Jen, Lixia Zhang, Lan Wang, Beichuan Zhang http://conferences.sigcomm.org/hotnets/2008/papers/18.pdf An attempt to trace the origins of these terms [msg05966] indicates they were developed in July 2008. The CEE/CES distinction has been controversial. Firstly, some RRG members stated that the distinction is unhelpful and/or does not reflect architectural differences. However, no detailed arguments have been presented to support this view. Another matter of discussion (msg06110) is whether the "separation" of CES had a dual meaning - not just the separation of one scalable "edge" subset of global unicast address space from the remaining "core" addresses, but what might better be referred to as "isolation" of all hosts on "edge" addresses from being able to send packets to "core" addresses. This Recommendation is based on an understanding of the CEE/CES distinction which is more fully discussed in (msg06110) and (msg05865) and which can be summarised as follows. These interpretations involve core and edge addresses. No scalable routing architecture actually eliminates "edge" networks, nor separates "edge" networks from "core" networks - since they are already separate. CEE Core-Edge (address) Elimination: A class of scalable routing architectures in which the Locator / Identity Separation naming model is introduced, replacing the current IPv4/v6 model in which the IP address plays both the Identifier and Locator roles. Hosts have portability of Identifiers and are able to do multihoming and inbound TE on IP addresses derived from two or more PA prefixes from two or more ISPs. These addresses are conventional PA "core" addresses and each such PA prefixes for an end-user network is fully aggregated into a shorter prefix the ISP advertises. Therefore, this use of address space is regarded as scalable. (This doubling or more of each end-user network's address requirements is one of the reasons CEE architectures are impractical for IPv4.) Therefore there is no need for any "edge" addresses - the unscalable PI addresses which are currently the only way of achieving portability, multihoming and inbound TE. A fully adopted CEE architecture therefore eliminates the need for "edge" addresses, and so for the distinction between "core" and "edge" addresses. The full sense of "Elimination" in this term is "Elimination of the need to distinguish between core and edge addresses, since there is no longer a need for edge addresses." CES Core-Edge (address) Separation: a subset of the global unicast address space is separated from the remaining "core" global unicast addresses and is supported by the CES system (ITRs, mapping system and ETRs) to be used as a new form of address space by which end-user networks can obtain the benefits of Portability, Multihoming and inbound TE in a scalable manner. This space is used in a manner which involves far less impact on the DFZ control plane per end-user prefix than than current PI techniques. The full sense of "Separation" in this term is "Separation of the global unicast address space into a scalable 'edge' subset, with the remainder being 'core' addresses". Since Portability, Multihoming and inbound TE can be provided with this new scalable "edge" space, there is no longer any need for the unscalable PI edge space. 17.4 - Core-Edge Elimination architectures ------------------------------------------ The four CEE architectures: GLI-Split ILNP - Identifier-Locator Network Protocol Name-Based Sockets RANGI differ considerably in their principles of operation. Some - Name-Based Sockets and RANGI - require upgraded applications. All require upgraded IPv6 stacks. None are practical for IPv4. Some require new network elements. Some involve optional new network element functions and optional upgraded applications. All of them face extremely high barriers to adoption for the following reasons. Firstly, they are only applicable to IPv6 due to their multihoming arrangements requiring at least double the amount of global unicast address space each end-user network actually needs. Secondly, they all require host stack upgrades - and those which require upgraded applications are at a further disadvantage. Thirdly they all provide substantial benefits to adopters (benefits of Portability, Multihoming and inbound TE for all, or almost all, communications) only after all, or almost all, hosts in the IPv6 Internet have also been upgraded. Finally, they all provide substantial routing scaling benefits - provision of benefits to adoptors for all or almost all of their communications and consequently the ability of PI-using networks to relinquish their PI space - only after all, or almost all, hosts in the IPv6 Internet have been upgraded. There is also a problem in that CEE architectures provide no obvious alternative to the functions currently performed by an ACL - such as allowing access to an academic publisher's site from any host in a university's prefix of IP addresses. CEE architectures cannot implement ACLs using the Locators in packets, since these are inherently unstable. For instance, the packets sent by a host in a multihomed site which uses inbound TE may arrive with Locators of different ISPs, even within the one session, in order to cause the return packets to be sent through one ISP or another. Another ISP, with consequent need for different Locators, could be added at any time. CEE architectures may not be able to have routers filter packets on any Identifier value found in the source fields of the the packet, since these are not necessarily numeric - or if numeric, may not be organised in a stable hierarchical fashion suitable for the simple masking operations of an ACL. These problems rule CEE architectures out of consideration for IPv4 but do not entirely rule out their adoption over a long period of time for IPv6. At their currently early stage of development, none of the four CEE proposals appear to be applicable to IPv6's ULA. Also, these proposals generally involve complex interworking arrangements with non-upgraded hosts. In order to deliver full benefits to adoptors, and full routing scaling benefits, CEE proposals all require that every host's stack (with non-upgraded applications) - or every host's stack and applications - adopt a new naming model which has long been regarded as architecturally superior to the one used today in IPv4 and IPv6: Locator / Identifier Separation. Instead of today's model, in which the IP address performs both Identifier and Locator roles, Loc/ID Separation involves two separate objects for these roles, in two separate namespaces. The exact naming models of these proposals vary, but they all have this in common. There are two significant benefits of a CEE architecture over a CES architecture. Firstly, CEE architectures are generally able to work without encapsulation while CES architectures generally require encapsulating traffic packets to tunnel them towards the destination network - with the the Path MTU Discovery problems which result from encapsulation. Secondly, CEE architectures are argued to involve either no extra complexity in the routing system - or at least less complexity than with a CES architecture. However, Loc/ID Separation and therefore all CEE architectures involve a fundamental problem which is the reason we believe no such architecture can be the basis of the most desirable future architectural enhancement to either IPv4 or IPv6: that the extra routing and addressing burdens placed on hosts cannot be justified by any such advantage CEE architectures have over CES architectures. To restate this in a different form, we believe that the additional routing system complexity inherent in CES architectures is a worthwhile price to pay in order to continue supporting the current two-level naming structure. Despite being arguably architecturally inelegant, the current two-level naming system is in practice faster and more efficient, and much less burdensome for all hosts, than the Loc/ID Separation alternative. In the current naming model, when a host A sends a packet with destination address X, the routing system will either drop the packet or deliver it to the host whose IP address is X. Except in the case of anycast, the address X uniquely identifies a particular destination host. So host A's act of placing IP address X into the destination field of a packet and sending it to the routing system typically achieves the most obvious goal of having the packet delivered to the host which is identified by its IP address X - and always achieves an equally important second goal: that the packet will not be delivered to any other host than the one with IP address X. To achieve the same assurances with Loc/ID separation requires that before the packet is sent, the sending host looks up the Identifier of the destination host in the ID->Loc mapping system (if it has not already cached the results of a previous lookup) and wait for a map reply, which contains the one or more Locators which are currently potentially valid for the destination host. With today's naming model, a typical two-way initial packet exchange involves host A obtaining an IP address for host B, perhaps via a DNS lookup; host A sending a packet with this address in the destination field; and host B using the address in the source field of this packet for the destination field of the reply packet - sending the packet back to A, without any kind of lookup or delay. While B cannot be sure the first packet was sent by A, the routing system ensures that its reply can only go back to host A. With Loc/ID separation, the DNS lookup or some other mechanism provides host A with an Identifier for host B. A then looks up this Identifier in the ID->Loc mapping system and so gains a Locator, which is needed for the address field of the packet. When B receives the packet, then it typically needs to look up A's Identifier in the mapping system in order to find a currently valid Locator to place in the destination field of the reply packet. This is required in any scenario where B must know the identity of the host to which it is replying. This is not always the case, but it is frequently the case - and for any CEE architecture involving unmodified IPv6 applications, it must be the case, since the application may be assuming that its reply packet can only be received by the host it understands is identified by the IPv6 address it places in the destination field of the packet it hands to the stack for transmission. All CEE proposals involve a global ID->Loc mapping system - a query-based system which is global in size and has one or a small number of authoritative servers for each query. There is typically little or no scope for caching these replies at any intermediate query servers, since it is important that the replies must contain fresh information and typically have short caching times at the querier. The global scale of these systems, as with DNS (other than with multiple anycast servers), involve inherently long potential delays and inherently greater risks of packet loss. None of these CEE proposals involve a global ID->Loc mapping system with local or nearby full-database query servers which could resolve these queries highly reliably and with delays which are always short enough to be insignificant to the applications. Some CEE proposals involve the initial DNS lookup providing Locators as well as Identifiers. However, these approaches have not yet been shown to support the resolution of a FQDN into multiple Identifiers, each with their own set of Locators. Nor can DNS be assumed to be the only frequently used approach to the commencement of a new communication session. The typically extra delays inherent in these ID->Loc lookups constitute a serious enough burden on all communications for us to argue that it would be better to implement the extra routing system complexities inherent in CES (with a suitably fast mapping system) in order to avoid these delays. Furthermore, all CEE architectures involve extra packets from and to the hosts due to these lookups, and in some cases in the form of extra host-to-host communications. Some CEE architectures involve longer packets being sent between the hosts, with the inclusion of Identifiers in some or all packets. For instance, Name Based Sockets involves sending potentially long FQDNs as Identifiers in initial packets sent between the hosts and RANGI involves a 36 byte Destination Options Header on all packets. GLI-Split and ILNP do not involve longer packets being sent between hosts. However GLI-Split involves significant extra functionality in routers, and ILNP has optional address rewriting by end-user network border routers. CEE architectures involve hosts in sending map request packets and waiting for the corresponding replies. Hosts must store more state than at present and must send and receive more bytes in order to achieve initial and sometimes continuing communications. When there is a multihoming link failure, or a mobility change in Locator, each individual host is required to securely notify all its correspondent hosts of the changed circumstances. There are scaling problems in doing if there are a large number of correspondent hosts. This Recommendation is based on the position that these burdens on all hosts - extra traffic, state and CPU resources - is undesirable for all hosts and cannot be justified by the avoidance of the additional routing system complexity which CES architectures entail. In particular, we argue that the imposition of mapping query and response packets on mobile hosts is even more unacceptable than for hosts with fast, inexpensive, reliable connections. In the foreseeable future, the majority of hosts will probably be connected via 3G wireless links, which are frequently suffer from high latency and limited bandwidths. The inherent unreliability of packet transmission in wide area radio networks is compounded by the likelihood that the link itself will be congested with application traffic - further increasing the risk that map request and reply packets will be dropped. There is a counter argument regarding a CES architecture also needing mapping lookups concerning both hosts in an initial two-packet exchange. There is some validity to this, but there are three important distinctions which mean the concerns remain. Firstly, in a CEE architecture, all Internet hosts must adopt the new system before there are significant benefits to adoptors or to scalable routing. This means all host-to-host communications use Locator / Identifier Separation and so, typically, require ID->Loc lookups at both ends. With CES, only a subset of hosts adopt the new scalable "edge" space. All hosts run by ISPs, and all end-user hosts on PA addresses, will continue to use "core" addresses - which require no lookups. It is reasonable to expect the great majority of domestic Internet services, and many business services, to remain on PA space - since they do not want or need Portability, Multihoming or inbound TE. Packets sent to the numerous hosts on these PA end-user networks do not require any lookups. Secondly, even if both hosts in a CES system are on "edge" space, there are two ways in which the nature of these "edge" to "core" mapping lookups which differs from those in a CES architecture. The CES devices doing the lookups are ITRs (Ingress Tunnel Routers) which are typically well connected. ITRs are not normally linked to the rest of the Net via slow, potentially unreliable links such as those most mobile devices will be relying upon. So this reduces the direct mapping lookup delays and further delays due to packet loss. Another benefit of CES is that the ITRs typically serve the needs of multiple sending hosts - and so are more likely to have the required mapping already cached - so avoiding any lookup or delay. Thirdly, in a CES architecture, a mapping reply may, and often will, return a range of "edge" addresses to which the same mapping applies. So a single enquiry about one address may return information which removes the need to send a query about another address within this range. CEE Identifier -> Locator mapping has no such mechanism, so every Identifier needs a separate lookup. Also, in the DRTM approach to real-time mapping, long caching times can be used, so reducing the need for repetitive lookups - since DRTM supports a later Cache Update being sent to the querier, as soon as the responding server learns of a change to mapping which was previously sent in a mapping reply. No CEE proposal provides such a facility - so in order for sending hosts to have up-to-date Locator information, caching times must be short, which frequently necessitates repeated requests. Also, since no CEE mapping systems utilize "local" or "nearby" query servers, the time delays and the total effort in handling these query and response packets is greater than with a CES system which uses longer caching times and nearby authoritative query servers. These problems of host burdens - extra delays, extra traffic and state etc. - are inherent in all CEE architectures. Because we believe the routing system should be constructed to serve the reasonable needs of hosts, and because we believe that a CES enhancement is capable of doing so for scalable routing and global mobility, we believe that no CEE architecture - any architecture which involves Locator / Identifier Separation - could be the best approach to upgrading Internet architecture. If CEE architectures were inherently easier to have widely adopted than CES architectures, a difficult decision would have to be made - pitting ease of adoption against long-term benefits. However since all CEE architectures return substantial benefits to adopters and to scalable routing only after nearly universal adoption, and since CES architectures provide immediate, full, benefits to adoptors and continual scalability benefits in proportion to adoption, there is no obvious reason to contemplate why CEE architectures should be adopted in preference to a good CES architecture. 17.5 - Core-Edge Separation architectures ----------------------------------------- CES architectures are applicable to both IPv4 and IPv6 and require no alterations to host stacks or applications. Unlike CEE architectures, a well implemented CES architecture is capable of supporting all an adopting network's traffic for Portability, Multihoming and inbound TE. This includes packets sent by hosts without any upgrades from networks without any upgrades. This results in immediate full benefits to all adopters, irrespective of how many other networks adopt the CES architecture. While a CEE architecture only provides substantial routing scaling benefits once all, or almost all, hosts adopt the new system, CES architectures provide routing scaling benefits in direct proportion to their level of adoption. When an end-user network which currently has no PI prefixes adopts a well implemented CES architecture, it gains the benefits of Portability, Multihoming and inbound TE for the first time. Alternatively, if the adopting network already has PI space, it retains these benefits while reducing the burden on the DFZ control plane. The network may convert its existing PI prefix to "edge" space, and then divide it freely into potentially numerous separately mapped portions of "edge" space, each which can be used at an ETR at any ISP. Meanwhile, the single original, short, prefix is all that is advertised in the DFZ. The network may also relinquish some of its PI space, returning the no-longer needed portion to the RIR (and so helping reduce the impact of the IPv4 address shortage) while retaining a longer prefix as edge space, as just described. This is on the frequently realistic assumption that its sites can operate with fewer than the 256 IPv4 addresses which is the current minimum number which can be used with conventional PI approaches. Alternatively, the network may relinquish all its PI space, removing one or more prefixes from the DFZ, and lease "edge space from another company, who advertise one or more shorter prefixes in the DFZ, with each such prefix providing "edge" space for thousands or tens of thousands of end-user networks. Another primary benefit of CES over CEE architectures is that they maintain the current two-level naming model of IPv4 and IPv6. As discussed in the previous section, this model results in faster communications establishment and less load on all hosts than with the CEE alternative of "Locator / Identifier Separation". Of the four CES proposals, two can be discounted from being the best choice for IETF development in the foreseeable future. TIDR must be discounted because it does not really solve the key problem of scalable routing - removing the burden of more numerous end-user prefixes from the BGP control plane. This is because TIDR's equivalent of mapping information is carried by DFZ routers - in a form closely resembling the way PI prefixes are advertised. Suitably modified DFZ routers are required to propagate this information, and the FIBs of these DFZ routers do not need entries for each individual end-user "edge" prefix. However the burden on the DFZ's BGP control plane is hardly lessened by TIDR. IRON-RANGER is an innovative CES architecture which at present has unresolved questions surrounding the scalability and security by which millions of ETR-like routers advertise the "edge" prefixes they are currently handling to one or potentially many "Virtual Prefix" routers, which could be located anywhere on the Net. Since this process involves repeated and continual communications going out from the ETR-like routers to these VP routers, and since the arrangements by which this is to be achieved are currently not well described, it is not clear that this process can scale well or be performed securely enough to result in a system which would be competitive with LISP or Ivip. Two other concerns about IRON-RANGER are that it does not seem to support TTR Mobility, or provide full support for packets sent from non-upgraded networks. (See msg06214 for further discussion.) LISP - and in particular LISP-ALT - is the best known and most developed CES architecture, with a small test network, an IETF WG and several teams implementing code which is intended to interwork smoothly. However despite more than two years of work, the LISP-ALT mapping system involves unresolved questions about scaling, potentially long delays and its ability to be structured to avoid single points of failure and unacceptable concentrations of traffic. Other concerns about LISP include the complexity of its ITRs and ETRs, difficulties with the ITR being able to reliably ascertain the reachability of the destination network through an ETR and difficulties having ETRs enforce any ISP border router filtering based on the source address of packets arriving from the DFZ. LISP lacks a mapping system which can provide its ITRs with the nearly instant responses which are required for a CES system not to impose significant delays on the establishment of initial communications. Since we can only have a scalable system widely enough adopted to solve the problem via purely voluntary adoption, the solution must be perceived as attractive for end-user networks of all sizes if it is to be successful. Any significant inherent delays, even if only on a fraction of initial communications, would put the system at risk of not being attractive enough to larger networks, which can get PI space, to be adopted widely enough to substantially solve the routing scaling problem. The DRTM (Distributed Real Time Mapping) system was designed for Ivip, and is intended to be applicable to LISP - but there is no assurance that it would be adopted in favour of ALT or any other mapping systems the LISP developers are currently working on. LISP can support TTR Mobility. However, its currently non-real-time mapping system would mean that mobile node tunnels to TTRs would need to be retained longer than with a CES architecture such as Ivip which had a real-time mapping system. Since its inception in mid-2007, Ivip has had a real-time mapping system. Until February 2010, this involved a somewhat decentralised, but still coordinated, system of "Replicator" servers which fanned mapping updates in real time to full-database "local" query servers in ISPs which choose to install ITRs. While this would still be possible, the new arrangement for Ivip is to adopt DRTM, in which there is no need for any server to be full-database for the entire set of "edge" devices, and in which ISPs and end-user networks with ITRs have their ITRs gain mapping from one or more purely caching Map Resolvers they install in their networks. Each Resolving Query Server (QSR) queries typically "nearby" Authoritative Query Servers (QSAs) which carry the real-time updated full mapping databases for the subset of the "Mapped Address Block" (MAB) prefixes which they serve. The MABOCs (MAB Operating Companies) which lease space in their MABs to thousands or millions of end-user networks typically need to establish a number of DITR (Default ITRs in the DFZ) sites all over the Net in order to handle packets sent by hosts in networks without ITRs. These sites also host the authoritative QSAs which QSRs in ISP and end-user networks query. While these DITR-site QSAs may as an option push streams of mapping updates to QSRs which would then maintain full mapping databases for some or all MABs, this is no longer a requirement of Ivip - and it appears that this will not be needed for a full-scale deployment, including for mass-adoption mobility. DRTM provides end-user networks with real-time control (total latency of two seconds or less) of the tunneling behavior of all ITRs in the world which are handling packets addressed to their "edge" address space. Ivip requires end-user networks to be responsible for detecting reachability problems in their multihoming arrangements and for making the necessary mapping change decisions. For conventional multihoming, most end-user networks would contract a specialised company to do this monitoring and to control their mapping. This can be combined with real-time control of inbound TE according to priorities sent from the end-user network to the external organisation which it has appointed to control its mapping. Inbound TE commands would be used by this organisation to control mapping as long as the network was reachable via all ETRs. When one or more ETRs becomes unusable, multihoming service restoration would be more important and the mapping would be changed to ensure full connectivity, with inbound TE as a second priority. To date, Ivip is the only CES architecture which modularly externalises the control of mapping from the CES system itself. This enables end-user networks to test reachability and control their mapping as they choose. This also removes the need for ITRs to make any decisions - and so enables the mapping to be greatly simplified: to a single ETR address. Ivip and LISP are the two proposals which appear to have the potential to substantially solve the basic routing scaling problem for both IPv4 and IPv6. Both are suitable for supporting the TTR Mobility architecture. Real-time, externalised, control of ITR tunneling behaviour is central to the benefits and simplifications which are feature of Ivip. In the next section we list the individual architectural elements which we believe should be incorporated in a future IETF-developed CES architecture. 17.6 - Desirable CES principles for IETF development ---------------------------------------------------- 17.6.1 - Synergies between these elements A fuller account of these architectural principles is http://tools.ietf.org/html/draft-whittle-ivip-arch The Ivip term "Scalable PI" (SPI) space is used below to denote that subset of the global unicast address space which is used for scalable provision to end-user networks of Portability, Multihoming and inbound TE. TTR Mobility also provides SPI space to Mobile Nodes (MNs). In Ivip, a continuous range of SPI space with a single mapping is known as a "micronet". This is defined by an integer starting point and an integer length - it need not be a prefix or have binary boundaries. 17.6.2 - Real time control of ITRs by end-user networks or their appointees When the end-user networks can directly control the way ITRs tunnel packets addressed to the their SPI space sites - and when they can delegate this control to other organisations - a number of benefits are achieved. The mapping can be simple - a single ETR address. ITRs need make no decisions between multiple ETR addresses, or do any tests of reachability of destination networks through ETRs. The ITR to QSR (Resolving Query Server) protocol (perhaps via one or more QSC caching query servers) and the QSR to "nearby" QSA protocols may be similar or identical and should both support Cache Updates being sent after the initial Map Reply, secured by the nonce in the original Map Request query. This is so mapping for a micronet which is cached in a QSR, QSC or ITR will be updated within a fraction of a second of the real-time mapping change arriving at the authoritative QSA query server at the DITR-site. This needs to be supplemented by the querier periodically checking the responder is still alive and has not been rebooted - and invalidating any cached mapping from that responder if it is unreachable or has been rebooted. The protocol also enables QSR to find the one, two or more generally "nearest" QSAs which are authoritative for each MAB. The protocol will include mechanisms for these query servers to return in their Map Reply and probably Cache Update messages some information which may direct or encourage Map Resolvers to use another QSA instead - as a method of dynamically load balancing the queries between multiple QSAs. With real-time mapping changes to the QSAs driving Cache Updates going to the ITRs which need them, the CES system can be used more flexibly than if ITRs could only be given fresh mapping on a non-real-time basis - and so had to test reachability and choose between ETR addresses on their own. Real-time mapping enables TTR Mobility to be somewhat simpler. Mapping changes are not frequently required in TTR Mobility - they are only typically desirable if the MN moves more than 1000 km or so, and are not required at all due the MN gaining a new address or access network. Real-time control enables MNs to terminate their tunnels to the previously used TTR within a few seconds rather than retain it for a longer time, as would be required if ITRs could only receive mapping information on a non-real-time basis. 17.6.3 - Mapping resolution The resolution of the mapping system, and therefore the ability to specify micronets (contiguous ranges of SPI space with a single mapping) should be single IPv4 addresses and for IPv6, /64 prefixes. A /64 should be sufficient as a minimal address allocation for any MN or site, and there is no reason to require ITR's FIBs to work with more than the 64 most significant address bits. Integer ranges of these units enable more flexible utilization than reliance on traditional binary-boundary prefixes. This would frequently result in better space utilization and a reduction in the number of micronets in many circumstances. For instance, if micronets were instead specified purely as prefixes, and an end-user network had 16 IPv4 addresses initially in a single micronet, this could be covered by a single prefix-based micronet. If they wanted to use two of these addresses in a single micronet, this would require a second micronet for the two addresses, and the remaining 14 addresses being covered by micronets of 8, 4 and 2 addresses each - a total of 4 micronets. With arbitrary integer lengths, there would only be two micronets - one of 2 IPv4 addresses and the other of 14. 17.6.4 - Flexibility of ITR and ETR placement The CES architecture should allow ITRs should be placed flexibly in ISP and end-user networks, including on SPI addresses. ITR functions in hosts should be an option - again for hosts on ordinary "core" addresses and on SPI "edge" addresses. ITR functions in hosts behind NAT should be considered as an option. ETRs should not be assumed to be run by ISPs or by end-user networks - either arrangement should be possible. ETRs will always be on "core" addresses - and so will never be behind NAT or another ETR. With proper flexibility, no matter whether an ETR is run by an ISP or the SPI-using end-user network, the flow of packets from other parts of an ISP network to this end-user network will be via a nearby ITR (including perhaps a DITR outside the ISP) and then to the ETR. This means that ISP networks need not keep the SPI space in their their internal routing systems. For non-mobile end-user networks, which may run their own ETRs on the PA space they already obtain as part of their Internet service, the only step their ISP needs to take in order to allow them to use SPI space is to accept outgoing packets from these networks, to be forwarded onwards, including to the rest of the Internet, when the source address of these packets is from the "edge" (SPI) subset of the global unicast address range. There are no such requirements of an ISP or access network to be used by a Mobile Node using TTR Mobility, since the MN tunnels to its one or more typically nearby TTRs, and sends all outgoing packets to the TTRs, rather than relying on its access network to forward these SPI source address outgoing packets. 17.6.5 - No requirement for new host functionality While ITR functions in hosts should be an option, and while MNs will require tunneling and other software to work with TTR Mobility, no other additional functions should be expected of hosts. 17.6.6 - Modified Header Forwarding Alternatives to encapsulation-based tunneling have been proposed for IPv4 and IPv6: 1 ETR Address Forwarding (EAF) - for IPv4. [I-D.whittle-ivip-etr-addr-forw] 2. Prefix Label Forwarding (PLF) - for IPv6. [http://www.firstpr.com.au/ip/ivip/PLF-for-IPv6/] These techniques avoid the overheads and PMTUD problems inherent in encapsulation-based tunneling. Both these require upgrades to all DFZ and some other routers. In the long-term, this can probably be achieved with little cost and no disruption. So the CES architecture should be designed to transition to these in the future whenever these upgrades can be complete. (The mechanisms for this transition are TBD.) Depending on the time-frame of adoption, it may not be out of the question to have all routers updated appropriately before introduction - and thereby avoid transition and the need to build PMTUD management functions into all ITRs and ETRs. 17.6.7 - PMTUD management Some difficult problems need to be solved for Path MTU Discovery when encapsulated tunneling is used, particularly to cope with DF=0 IPv4 packets and to allow the system to take advantage of jumboframe paths across the DFZ as they become available. Ivip's IPTM protocol for doing this is more developed than LISP's. PMTUD problems of CES architectures are discussed in a February 2010 RRG thread [msg05910]. 17.6.8 - IP-in-IP encapsulation for traffic packets With real-time mapping, ITRs no longer need to communicate with ETRs regarding reachability. ITR to ETR communications are limited to the new protocols required for handling PMTUD. In this case, there may be no need for a UDP and then another header before the traffic packet - so basic IP-in-IP encapsulation may be employed to minimise encapsulation overhead. There may be an argument for using UDP encapsulation if this is found to be necessary for ITR to ETR tunneling to be compatible with ECMP/LAG. 17.6.9 - ETR support for ISP border router source address filtering Some ISPs use their BRs to drop packets arriving from the DFZ if their source address matches that of any prefix the ISP advertises - its own prefixes or those of any PI using customers. The only inexpensive, configuration-free, approach to having ETRs enforce on inner packets any source address filtering imposed by the ISP on packets arriving from the DFZ appears to be to have ITRs tunnel packets with the outer source address being that of the sending host - and then for ETRs to drop inner packets whose inner source address does not match the outer header's source address. This has implications for PMTUD management, and is part of Ivip's IPTM protocol. LISP was designed from the outset to have the outer header's source address being that of the ITR. So the only way a LISP ETR could support this ISP BR source address filtering is to implement it directly on the source address of the decapsulated (inner) packet. This would be inordinately expensive for large numbers of protected prefixes, and is incompatible with the principle that a packet originating inside the ISP's network, including from one of its customer networks, should be able to be encapsulated by an ITR in this ISP's network and be tunneled to the ETR. To do so would cause the ETR filtering mechanism to drop the packet, unless the mechanism was elaborated with tests to see whether the packet was encapsulated by an ITR inside the ISP network or outside. This is in fact impossible to ascertain - so the only way of pursuing this approach, with LISP's current arrangement of outer header source address being that of the ITR, would be to require all the ISPs customers using "edge" (EID) space to have packets forwarded to them from the just described "internal" sources, by the ISP's own internal routing system, rather than by letting them be handled by ITRs. This appears to involve prohibitive constraints, such as forcing the internal routing system to have routes for every EID prefix used by any network which uses ETRs inside this ISP. This set of EID prefixes could change rapidly, and there are scaling and security challenges in end-user networks communicating their changing EID space usage to their one, two or more ISPs. This approach of avoiding ITRs for packets sent from inside the ISP network would be at odds with the need for portability and multihoming service restoration, as well as being at odds with the ability of LISP ITRs to control inbound TE. These need to be controlled by the LISP ITRs alone. So it appears that the only way of ETRs supporting ISP BR filtering is for ITRs to tunnel their packets as Ivip ITRs do - with the outer source address being that of the sending host. 17.6.10 - Minimise typical mapping delays by use of nearby query servers While "local" full database query servers in ISPs can reliably and quickly return mapping replies, and so reduce the initial packet delay problem to tens of milliseconds - long enough for ITRs to buffer traffic packets while awaiting the map reply - these local full database query servers and the continual stream of real-time mapping information they require has raised objections. With DRTM, these arrangements are no longer necessary. DRTM uses "nearby" authoritative QSA query servers which are full-database for a subset of the MABs in the entire system - where "nearby" means within 5000km (25ms in fibre) or so. MABOCs will need to reach out with widely distributed DITR-sites and will provide authoritative query servers at these sites. Each MABOC - or at least each company which runs the DITR sites for one or more MABOCs - will therefore provide multiple QSAs for all MABs served by these DITR sites. With dynamic load balancing between these QSAs, this arrangement can be expected to scale well to the largest imaginable deployments. The question of how these QSAs reliably and securely gain their real-time mapping updates can be solved by the MABOCs or the companies which run the DITR sites. Private network links and proprietary protocols can be considered, since this real-time mapping distribution takes place entirely within one organisation, or within the one one organisation but is driven from data supplied by one or more trusted and closely related MABOC organisations. There are no obvious reasons why this cannot be done - and the concerns which applied to Ivip's previous global "Replicator" system do not apply to these DRTM arrangements. This approach, which is part of DRTM, does not absolutely ensure an ISP's Map Resolver will find the nearest authoritative query server for a given MAB is "nearby" - such as less than 5000km distant, and ideally closer in densely populated regions such as Europe, North America and much of Asia. However, in general this will be the expected result, since MABOCs will be motivated to establish multiple widespread sites to handle these queries and to perform DITR operations in order to handle traffic addressed to the SPI addresses of their end-user customers.
- [rrg] Recommendation suggestion from RW (v2) Robin Whittle