[rrg] Ivip map-encap & MHF in draft-irtf-rrg-recommendation-00
Robin Whittle <rw@firstpr.com.au> Tue, 24 February 2009 10:45 UTC
Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 3B79E3A6A90 for <rrg@core3.amsl.com>; Tue, 24 Feb 2009 02:45:45 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.663
X-Spam-Level:
X-Spam-Status: No, score=-0.663 tagged_above=-999 required=5 tests=[AWL=0.433, BAYES_00=-2.599, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, SARE_SUB_RAND_LETTRS4=0.799]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1hU7exzLlPkN for <rrg@core3.amsl.com>; Tue, 24 Feb 2009 02:45:42 -0800 (PST)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id B122D3A67FD for <rrg@irtf.org>; Tue, 24 Feb 2009 02:45:41 -0800 (PST)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id 44D91175A86; Tue, 24 Feb 2009 21:45:59 +1100 (EST)
Message-ID: <49A3CF70.2020602@firstpr.com.au>
Date: Tue, 24 Feb 2009 21:44:00 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.19 (Windows/20081209)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: [rrg] Ivip map-encap & MHF in draft-irtf-rrg-recommendation-00
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Feb 2009 10:45:45 -0000
Here are some suggestions for improving: http://tools.ietf.org/html/draft-irtf-rrg-recommendation-00 to properly describe the architectural elements which are part of the Ivip proposal. This is an update of: Re: [rrg] Summary of architectural solution space - Ivip still isn't properly covered (2008-12-21) http://www.irtf.org/pipermail/rrg/2008-December/000526.html I think there are multiple problems with section 3.3. As it stands at present, it is based on Bill Herrin's page (sixth draft): http://bill.herrin.us/network/rrgarchitectures.html You can see from my attempt to match actual proposed architectures to this taxonomy: http://www.firstpr.com.au/ip/ivip/rrgarch/ that I think this taxonomy needs quite a lot of work to ensure it properly covers the most prominent proposals. 3.3.1.1. Variants I think A1c or perhaps A1b best describes Ivip. I am not sure if A1a or perhaps A1b applies to any architecture which has been proposed. 3.3.1.2. Mapping approaches As I wrote here: http://www.irtf.org/pipermail/rrg/2008-December/000526.html I think none of A2a, A2b or A2c adequately describe Ivip. A2b comes closest, but it is not true to say of Ivip that: "The registry pushes all incremental changes in near-real time to all encoders which add RLOCs to the packets." That implies all ITRs get all mapping changes, which is not the case with Ivip - or with APT. I think A2b does not match any proposed architecture. I suggest a new item be added (or maybe to replace A2b): A2d. GUIDs dynamically mapped to each RLOC are pushed towards a central or distributed registry as they change. The registry pushes all incremental changes in near-real time to all full database query servers in ISP and/or end-user networks. The encoders request mapping from these local query servers. The response has a caching time and the local query server will push any changed mapping to an encoder if it receives such a change for mapping which matches a recent query which is still within its caching time. There may be one or more full database query servers in each ISP and there may be one or more layers of caching query servers between these and the encoders. For brevity, I am leaving out other important things such as it being possible for an end-user network to have its own query servers, including full-database and/or caching, for its own ITRs - and the option of having the ITR function in sending hosts (not behind NAT). I think APT should have its own separate description. 3.3.1.3. Failure handling approaches I suggest A3b be replaced by: A3b. End-user networks are responsible for controlling the mapping of their EID address space. They may do this directly or they may contract a third party to do this for them. Each EID prefix (or non-prefix, contiguous, range of address space: one or more contiguous IPv4 addresses or IPv6 /64s) is mapped to a single RLOC address. With strategy A2d, for reasons such as a multihoming link failure, to implement inbound traffic engineering, or to implement address portability when moving to another ISP, the end-user network causes the mapping to change to the new RLOC address, and this is conveyed to all full database query servers in near real time. These push the changed mapping to any encoders which may need it, based on previous queries and the caching times of the responses. 3.3.1.4. Compatibility approaches The current A4d is a rough description of LISP's approach to PMTUD. It does not adequately describe Ivip's approach. I suggest this new version of A4d to make it clearer that this is one of at least two ways of dealing with the PMTUD problems caused by encapsulation: A4d Standard IPv4 and IPv6 packets are tunnelled while they transit the Internet core, by encapsulating them with a header in which the source address is that of the encoder. Path-MTU problems in routers between the encoder and decoder are handled by sending a Packet To Big (PTB) to the encoder which must construct a valid PTB message and send it to the originating host. As noted below {pointing to the new text below I suggest for 3.3.1.6) this requires the encoder to retain significant state, the encapsulation header to include a nonce and the original PTB to contain enough of the offending packet to include the nonce. (I guess APT would use much the same approach as LISP's, although the work would be split in some way between the ITR and Default Mapper, whereas in LISP it is all done by the ITR - "encoder" in this document.) I think this is a good, terse, description of LISP's stateful approach to PMTUD. However, to keep it short, I haven't mentioned one important thing - that this approach is unlikely to support hosts sending fragmentable packets longer than some length a little less than 1500 bytes. We don't want jumboframe-inclined hosts sending 9000 byte fragmentable packets in a world where DFZ links often or even sometimes have 1500 byte MTUs, the ITR dutifully encapsulating them and then having them fragmented into 8 or more pieces there or en-route to the ETR. I suggest this new item be added. I will call it A4h, but these should probably be reordered in the final draft so the following comes after A4d: A4h Standard IPv4 and IPv6 packets are tunnelled while they transit the Internet core, by encapsulating them with IP-in-IP, with no other header information and with the source address of the outer header being that of the originating host. The encoder maintains an upper and lower estimate of the PMTU to each decoder it is currently tunneling packets to. When a traffic packet arrives at the encoder which, with encapsulation overhead, would have a length exceeding the upper estimate of the PMTU to the relevant decoder, the encoder sends a PTB to the originating host. When the encapsulated length is less than the lower estimate, the encoder encapsulates and tunnels the packet normally. When the encapsulated length falls between the two estimates, the encoder splits the packet and sends it as two. One short packet is encapsulated with the outer source address being that of the original sending host. The other packet has the same length as the traffic packet would have if it were normally encapsulated. Both packets contain a nonce and potentially other information to enable the decoder to positively acknowledge the correct receipt of both packets. The longer packet's outer source address is that of the encoder, so if it exceeds the next hop MTU of an intermediate router, that router will send a PTB to the encoder. That PTB should contain enough of the encapsulation header to include the nonce. This, the positive acknowledgement or lack thereof from the decoder, and subsequent packets from the originating host will enable the encoder to progressively adjust its lower and upper estimates of PMTU for this decoder until they match. The encoder is able to send a valid PTB to the sending host if the pair of packets did not successfully reach the decoder. I know this is longish, but map-encap's PMTUD problems are difficult to solve. This is a really short description of the best approach I can think of - actually the only one I think which could work. This is more fully developed than LISP's stateful approach. It avoids sending nonces or any other header guff beyond basic IP-in-IP encapsulation for the great majority of packets. Only those few packets whose length falls within the current zone of uncertainty a (between the lower and upper estimates of PMTU) are given this special dual-packet treatment. Every such treatment will enable the ITR to narrow the zone of uncertainty by either raising its lower estimate of PMTU or dropping its upper estimate. This is a good enough overall description for the taxonomy, but it doesn't describe everything, for instance how the encoder (ITR) needs to periodically test that the current estimates of PMTU are not too high or too low. Here are improved versions of the descriptions of Ivip's Modified Header Forwarding techniques. I am trying to keep them brief. A4f For IPv6, rather than encapsulate the packet to tunnel it to the decoder, use a modified header format in which 19 or 20 bits of the IPv6 header's flow label are used to contain enough information about the RLOC (such as which of a range of DFZ prefixes the RLOC is within) for the packet to be forwarded by suitably modified routers to the correct ISP network for that RLOC. The encoder sets these bits and the decoder restores the original state of the packet. A4g For IPv4, rather than encapsulate the packet to tunnel it to the decoder, use a modified header format containing the 30 bit RLOC address of the decoder so suitably modified routers will forward the packet to the decoder. These bits are currently used for the More Fragments flag, the Fragment Offset and the Header Checksum. The encoder sets these bits and the decoder restores the original state of the packet. The encoder does not accept fragments or fragmentable packets longer than some constant, likely to be marginally below 1500 bytes. The descriptions of these approaches in 3.3.1.x are exceedingly minimal. I don't know why everything has to be so terse. Internet drafts are printed on 100% recycled electrons, and I think it is important to help people - including those outside the RRG - identify and understand the various possible architectures as clearly as possible. The taxonomy should clearly cover every seriously proposed architecture - and should probably not cover anything else without noting that such an architecture may be possible, but has not yet been proposed. I think that if the RRG's recommendation (even if it is an interim summary of progress to date) is is to have any discussion of the strengths and weaknesses of various approaches, then that discussion should be full and detailed, not just some terse, arbitrarily truncated subset of what a fully detailed list of strengths and weaknesses would be. I think the best way to do this is for the individual strategies to have a fuller description than at present, including probably their most prominent strengths and weaknesses. Otherwise, many important aspects of the strategies would need to be mentioned in the later evaluation of strengths and weaknesses. 3.3.1.6. Major criticisms This statement is inadequate: Handling path-MTU is a usually problem since the packets in the core are different than the origin host would recognize. This does not apply to "Map & Forward with Modified Header Forwarding" because the router which creates the PTB message can reconstruct the original state of the packet, and so can generate a perfectly valid PTB which will be recognised by the sending host. Also, since there is no packet overhead, the MTU value returned in the PTB message gives the sending host the correct guidance on how long a packet it can send to avoid MTU problems at that router in the future. I am wary of the term "genuinely clean", but will retain it in what follows. I suggest this new text for 3.3.1.6: There don't appear to be any genuinely clean ways of implementing strategy A. Encapsulation-based approaches raise Path MTU Discovery problems for two reasons. Firstly, the encapsulated packet is longer than the packet sent by the sending host, meaning that any MTU value returned by a router in a PTB message does not give the sending host the information it needs to avoid the MTU limitation. Secondly, the PTB as generated by the router is will not be recognised by the sending host, either due to the encapsulated packet being different to that sent (if the outer source address is that of the sending host) and/or (if the outer source address is that of the encapsulating router) due to the PTB being sent to the encapsulating router, rather than the sending host. In the latter case, a successful solution to the PMTUD problem relies on the encapsulating router (encoder) being able to securely construct a valid PTB to be sent to the sending host. This in turn involves significant processing at that encoder and the retention of a problematic amount of state: for each packet sent which might cause MTU problems. It also relies on the encapsulated packet having a nonce in its headers and the original PTB carrying enough of that packet to include the nonce - which is more than is required by RFC 1191. Forwarding with modified headers does not involve the PMTUD and overhead problems which are inherent in encapsulation. The router at which the MTU limit is reached can reconstruct the original packet as sent by the originating host, and so can create a valid PTB. However, forwarding with modified headers involves upgrades to all core routers and some or many internal routers. This is feasible in the long-term, so an encapsulation system could and arguably should migrate to a modified header forwarding system over time. Depending on the flexibility of the installed base of routers and on the urgency of introduction, it may be possible to introduce a scalable routing architecture without using encapsulation by using just modified header forwarding from the outset. For IPv4, with either modified header forwarding or encapsulation and the PMTUD management required for encapsulation, it seems unlikely that it will be possible to support fragments, or fragmentable packets longer than a constant which is likely to be somewhat below 1500 bytes. I can't see a terser way of summing up the PMTUD problems of Map & Encap and how MHF differs. Unless there is a robust, efficient method of solving the PMTUD problems, these problems are a show-stopper for Map & Encap. If the RRG's report is going to discuss the strengths and weaknesses of the various approaches, then I think the discussion must be much more complete than the current, overly terse, rather than piecemeal text. I have avoided mentioning how Ivip's approach involves only a handful of packets getting the special handling required for PMTUD, and how those include a nonce, but that all other traffic packets include no such nonce, thereby reducing the overhead. Nor have I mentioned the reason why Ivip sends encapsulated packets with the outer source address being that of the sending host (to automatically extend border router ingress filtering of source addresses to the decapsulated packet, by way of the ETR dropping packets where the inner source address does not match the outer source address). The next sentence is inadequate too: Extra bandwidth is consumed by the ingress tunnel router figuring out whether the egress tunnel router is still available and functioning. I suggest this be replaced with something like: Extra bandwidth, in addition to that required to distribute mapping, may be consumed by some Strategy A approaches. For instance, some architectures use A3a - multiple RLOCs per EID - with the encoders (ingress tunnel routers) being responsible for individually determining which RLOC address to use when encapsulating packets. Multiple RLOC addresses are required because these architectures use A2c or some other non-real-time approach to mapping distribution. The encoders are therefore required to individually determine reachability to each encoder they are currently tunneling packets to. This involves extra state, computational load and bandwidth consumption by each encoder and probably each decoder. While this approach may be capable of coping better with some localised outages than A4h, it involves considerable scaling problems for busy encoders and decoders. Extra bandwidth is consumed by all encapsulation approaches. A4h encapsulation consists only of IP-in-IP header (except when the encoder is discovering the PMTUD to a decoder). Encapsulation architectures other than A4h add additional headers, such as a UDP header followed by an architecture-specific header to each traffic packet. When probing PMTU, A4h involves extra complexity for the encoder and decoder, with an additional short packet and a differently encapsulated full-length packet, with further packets needed for the decoder to acknowledge receipt of one or both packets. However, this probing activity is only prompted by longer traffic packets which are used as the probes. It is self-limiting since each such probe reduces the uncertainty the encoder has about the PMTU to a given decoder. This sentence needs to be changed too: Border filtering of source addresses becomes problematic. This is true of LISP, APT and TRRP - but Ivip shouldn't be tarred with the same brush. Here is a suggested replacement. The encapsulation architectures which use the encoder's address in the outer header involve significant problems with border filtering of incoming packets based on source address - for instance to stop spoofing of source addresses inside an ISP's network. For such architectures (A4d), the only solution seems to be to replicate this expensive filtering on the decapsulated packets, at every decoder (egress tunnel router). With strategy A4h, border filtering of source addresses can be extended to the decapsulated packets much more easily - by the ETR dropping any inner packets whose source address do not match the outer source address. Strategies A4f and A4g have no problems with this filtering, since the modified border routers will continue to filter on the source address, which is unchanged in the header. This sentence is another instance of a statement which applies to LISP, as currently conceived, but not to Ivip: Deployment may require heavy weight "for the public good" relays in the non-upgraded part of the Internet to facilitate migration. Depending on your view of APT's handling of packets from non-upgraded networks (which I think is more favourable if there is a single APT island, rather than multiple such islands) the above statement is not an adequate assessment of APT either. Here is a suggestion which retains the critique of LISP - which currently has no business plan for Proxy Tunnel Routers (or a technical arrangement supporting ownership and running which could lead to a favourable business plan) but which recognises that Ivip has more promising arrangements. This would need to be modified or extended to adequately describe APT and TRRP. (TRRP's approach involves a bunch of sticks and carrots which would be challenging to describe in a concise manner!) In order to be attractive for early adoptors, Strategy A architectures need a means of handling packets sent from networks which lack encoders. This raises technical and business problems, considering the strong desirability of these packets reaching a suitable "open", "proxy" or "relay" encoder which is not too far out of the path the packet needs to take to the correct decoder. Since these "open" (etc.) encoders need to be in the DFZ, and so need to advertise prefixes in order to attract these raw packets from non-upgraded networks, it is vital for routing scaling reasons that each such advertisement covers the EID space of a very large number of end-user networks. These constraints lead to questions of who is to operate this presumably widely spread set of "open" encoders, who is to manage the large blocks of address space from which EIDs are subdivided, and how the operators of these encoders will be paid according to the widely varying traffic they handle for packets addressed to different end-user networks. These problems are not necessarily incapabable of solution. The best outcome will involve technical arrangements specifically intended to support address management and other business arrangements which make it attractive for organisations to operate these one or more globally distributed networks of "open", "proxy" or "relay" encoders. For a fuller discussion of what I am trying to summarize here, see: Business incentives for LISP PTRs and Ivip OITRDs http://psg.com/lists/rrg/2008/msg02021.html These sentences do apply properly to Ivip: During the transition period, it appears difficult to remove legacy prefixes from the global routing table. The best that can be done is to advertise aggregates of legacy prefixes from the relays. This may have an impact on stretch. I am not sure how well they apply to LISP or APT, since there is little or nothing in those proposals about how end-user networks will gain new EID space, or convert their current PI space to EID space. The concept of a "transition period" is not necessarily valid. There may never be a complete switch-over of all end-user networks to the new form of addressing. Ivip is not predicated on there being a complete switch-over. The idea is to make the new kind of space so attractive to end-user networks of all types and of all sizes (at least those which require multihoming, portability and/or mobility) that over time, the great majority of end-user networks will adopt the new space. In this model, the routing scaling problem can still be solved without a complete transition. The first sentence above seems to assume that EID space will often or always be created by converting existing PI prefixes. That will be part of the story, I am sure - for instance for large corporate and university networks with PI space today, who decide to adopt the new form of scalable addressing. However, since the various scalable routing architectures are intended to scale to a vastly higher number of end-user networks (billions) than have PI space today (one or so hundred thousand?), clearly, in the future, most such end-user networks will be getting fresh address space which is already part of the new addressing scheme. The first sentence's concern does not apply to the EID space of all those "new" end-user networks. "Stretch" (longer total paths than would otherwise be the case) would be a problem if the "relays" (meaning LISP Proxy Tunnel Routers) are not widely distributed. (Actually, it is OK to have PTRs or their Ivip equivalents - OITRDs - only in one area, as long as all the ETRs for EID prefixes advertised by those PTRs or OITRDs are in the same area. I am assuming a more general case which must provide generally optimal paths when both the sending hosts and the ETRs are freely distributed all over the world.) If my above suggested text is included, I don't think we need these last 3 sentences ("During ..", "The best ..." and "This may ..."). There are challenges. I think Ivip has a promising set of technical and business measures to solve them - and that the other proposals so far do not. We do need to discuss the problems of support of packets from non-upgraded networks, since lack of this would be a show-stopper for any map-encap or map-forward-with-modified-headers architecture. Without going into much longer discussions, I think my text above discusses the challenges and indicates that solutions seem to be possible, but are not yet developed for all architectures. The current discussion says nothing of the ability of various Strategy A architectures to support mobility. Current mobile IP approaches are limited in scope, and only work for IPv6. They are essential for mobility within a particular network but not suited to the real problem of providing seamless global mobility across every kind of access network. So just adding map-encap or map-forward without special consideration of mobility is not going to enable a better total mobility outcome than what is currently possible with MIPv6. Mobility support for map-encap or map-forward does not involve changing the mapping every time the MN gets a new CoA. The only feasible solution seems to be to use two-way tunnels from each CoA to an ETR-like "Translating Tunnel Router" (TTR), as described here: TTR Mobility Extensions for Core-Edge Separation Solutions to the Internet's Routing Scaling Problem http://www.firstpr.com.au/ip/ivip/TTR-Mobility.pdf This approach does not add any complexity to the core-edge separation scheme itself, since the TTRs behave like any other ETR or ITR. The near-real-time mapping approach of Ivip, or any other architecture which adopts the A2d approach, as described above, is clearly in a better position to adapt quickly to end-user needs to change the mapping to a new TTR than the non-real-time approaches of A2a (LISP-NERD) or A2c (LISP-ALT, APT and TRRP). I suggest some new text such as: The optimal new routing and addressing architecture should arguably provide maximal support for mobility on a global scale. Existing mobile IP approaches are for IPv6 only and tend to rely on additional capabilities in the access network and/or on a "home agent" router, which may not be close to the mobile node's current location - raising performance and efficiency problems. While strategy A architectures may not pose inherent difficulties for existing mobile IP techniques, it would be an advantage if an architecture supported potentially new mobility techniques which provided global mobility, with any access network, with generally optimal path lengths and without reliance on any fixed "home agent" or on particular capabilities of access networks. - Robin
- [rrg] Ivip map-encap & MHF in draft-irtf-rrg-reco… Robin Whittle
- Re: [rrg] Ivip map-encap & MHF in draft-irtf-rrg-… Tony Li
- [rrg] Brevity, abstraction & detail; Consensus on… Robin Whittle