idnits 2.17.1 draft-ietf-softwire-dual-stack-lite-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 35 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 8 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. == There are 6 instances of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 3, 2010) is 5196 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'UPnP-IGD' is defined on line 1485, but no explicit reference was found in the text == Outdated reference: A later version (-10) exists of draft-ietf-softwire-ds-lite-tunnel-option-01 == Outdated reference: A later version (-07) exists of draft-cheshire-nat-pmp-03 == Outdated reference: A later version (-02) exists of draft-ford-shared-addressing-issues-01 == Outdated reference: A later version (-05) exists of draft-nishitani-cgn-03 == Outdated reference: A later version (-10) exists of draft-ymbk-aplusp-05 Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force A. Durand, Ed. 3 Internet-Draft Comcast 4 Intended status: Standards Track February 3, 2010 5 Expires: August 7, 2010 7 Dual-stack lite broadband deployments post IPv4 exhaustion 8 draft-ietf-softwire-dual-stack-lite-03 10 Abstract 12 This document revisits the dual-stack model and introduces the dual- 13 stack lite technology aimed at better aligning the costs and benefits 14 of deploying IPv6. Dual-stack lite enables a broadband service 15 provider to share IPv4 addresses among customers by combining two 16 well-known technologies: IP in IP (IPv4-in-IPv6) and NAT. 18 Status of this Memo 20 This Internet-Draft is submitted to IETF in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt. 36 The list of Internet-Draft Shadow Directories can be accessed at 37 http://www.ietf.org/shadow.html. 39 This Internet-Draft will expire on August 7, 2010. 41 Copyright Notice 43 Copyright (c) 2010 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2. Requirements language . . . . . . . . . . . . . . . . . . . . 4 60 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 4. Deployment scenarios . . . . . . . . . . . . . . . . . . . . . 5 62 4.1. Access model . . . . . . . . . . . . . . . . . . . . . . . 5 63 4.2. Home gateway . . . . . . . . . . . . . . . . . . . . . . . 5 64 4.3. Directly connected device . . . . . . . . . . . . . . . . 6 65 5. B4 element . . . . . . . . . . . . . . . . . . . . . . . . . . 7 66 5.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 7 67 5.2. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 7 68 5.3. Fragmentation and Reassembly . . . . . . . . . . . . . . . 7 69 5.4. AFTR discovery . . . . . . . . . . . . . . . . . . . . . . 7 70 5.5. DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 5.6. Interface initialization . . . . . . . . . . . . . . . . . 8 72 5.7. Well-known IPv4 address . . . . . . . . . . . . . . . . . 8 73 6. AFTR element . . . . . . . . . . . . . . . . . . . . . . . . . 8 74 6.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 8 75 6.2. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 9 76 6.3. Fragmentation and Reassembly . . . . . . . . . . . . . . . 9 77 6.4. DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 78 6.5. Well-known IPv4 address . . . . . . . . . . . . . . . . . 10 79 6.6. Extended binding table . . . . . . . . . . . . . . . . . . 10 80 7. Network Considerations . . . . . . . . . . . . . . . . . . . . 10 81 7.1. Tunneling . . . . . . . . . . . . . . . . . . . . . . . . 10 82 7.2. VPN . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 83 7.3. Multicast considerations . . . . . . . . . . . . . . . . . 10 84 8. NAT considerations . . . . . . . . . . . . . . . . . . . . . . 10 85 8.1. NAT pool . . . . . . . . . . . . . . . . . . . . . . . . . 10 86 8.2. NAT conformance . . . . . . . . . . . . . . . . . . . . . 10 87 8.3. ALG . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 88 8.4. Port allocation . . . . . . . . . . . . . . . . . . . . . 11 89 8.4.1. How many ports per customers? . . . . . . . . . . . . 11 90 8.4.2. Dynamic port assignment considerations . . . . . . . . 12 91 8.4.3. Subscriber controlled port assignment . . . . . . . . 12 92 8.5. Other considerations about sharing global IPv4 93 addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 94 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12 95 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 96 11. Security Considerations . . . . . . . . . . . . . . . . . . . 13 97 12. Author's Addresses . . . . . . . . . . . . . . . . . . . . . . 14 98 13. Appendix A: future DS-Lite extensions . . . . . . . . . . . . 15 99 13.1. Static port reservation . . . . . . . . . . . . . . . . . 15 100 13.1.1. Port forwarding model . . . . . . . . . . . . . . . . 16 101 13.1.2. A+P model . . . . . . . . . . . . . . . . . . . . . . 16 102 13.2. Dynamic port reservation . . . . . . . . . . . . . . . . . 16 103 13.2.1. UPnP . . . . . . . . . . . . . . . . . . . . . . . . . 16 104 13.2.2. NAT-PMP . . . . . . . . . . . . . . . . . . . . . . . 17 105 13.2.3. DHCPv6 . . . . . . . . . . . . . . . . . . . . . . . . 17 106 14. Appendix B: Examples . . . . . . . . . . . . . . . . . . . . . 17 107 14.1. Gateway based architecture . . . . . . . . . . . . . . . . 17 108 14.1.1. Example message flow . . . . . . . . . . . . . . . . . 20 109 14.1.2. Translation details . . . . . . . . . . . . . . . . . 24 110 14.2. Host based architecture . . . . . . . . . . . . . . . . . 25 111 14.2.1. Example message flow . . . . . . . . . . . . . . . . . 28 112 14.2.2. Translation details . . . . . . . . . . . . . . . . . 32 113 15. Appendix C: Deployment considerations . . . . . . . . . . . . 32 114 15.1. AFTR service distribution and horizontal scaling . . . . . 32 115 15.2. Horizontal scaling . . . . . . . . . . . . . . . . . . . . 33 116 15.3. High availability . . . . . . . . . . . . . . . . . . . . 33 117 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 33 118 16.1. Normative references . . . . . . . . . . . . . . . . . . . 33 119 16.2. Informative references . . . . . . . . . . . . . . . . . . 34 120 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 36 122 1. Introduction 124 The common thinking for more than 10 years has been that the 125 transition to IPv6 will be based on the dual stack model and that 126 most things would be converted this way before we ran out of IPv4. 128 It has not happened. The IANA free pool of IPv4 addresses will be 129 depleted soon, well before any significant IPv6 deployment will have 130 occurred. 132 This document revisits the dual-stack model and introduces the dual- 133 stack lite technology aimed at better aligning the costs and benefits 134 of deploying IPv6. Dual-stack lite will provide the necessary bridge 135 between the two protocols, offering an evolution path of the Internet 136 post IANA IPv4 depletion. 138 Dual-stack lite enables a broadband service provider to share IPv4 139 addresses among customers by combining two well-known technologies: 140 IP in IP (IPv4-in-IPv6) and NAT. 142 This document makes a distinction between a dual-stack capable and a 143 dual-stack provisioned device. The former is a device that has code 144 that implements both IPv4 and IPv6, from the network layer to the 145 applications. The later is a similar device that has been 146 provisioned with both an IPv4 and an IPv6 address on its 147 interface(s). This document will also further refine this notion by 148 distinguishing between interfaces provisioned directly by the service 149 provider from those provisioned by the customer. 151 Pure IPv6-only devices (i.e. devices that do not include an IPv4 152 stack) are outside of the scope of this document. 154 2. Requirements language 156 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 157 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 158 document are to be interpreted as described in RFC 2119 [RFC2119]. 160 3. Terminology 162 The technology described in this document is known as dual-stack 163 lite. The abbreviation DS-Lite will be used along this text. 165 This document also introduces two new terms: the DS-Lite Basic 166 Bridging BroadBand element (B4) and the DS-Lite Address Family 167 Transition Router element (AFTR) 169 4. Deployment scenarios 171 4.1. Access model 173 Instead of relying on a cascade of NATs, the dual-stack lite model is 174 built on IPv4-in-IPv6 tunnels to cross the network to reach a 175 carrier-grade IPv4-IPv4 NAT (the AFTR) where customers will share 176 IPv4 addresses. There are numbers of benefits to this approach: 178 o This technology decouples the deployment of IPv6 in the service 179 provider network (up to the customer premise equipment or CPE) 180 from the deployment of IPv6 in the global Internet and in customer 181 applications & devices. 183 o The management of the service provider access networks is 184 simplified by leveraging the large IPv6 address space. 185 Overlapping private IPv4 address spaces are not required to 186 support very large customer bases. 188 o As tunnels can terminate anywhere in the service provider network, 189 this architecture leads itself to horizontal scaling and provides 190 great flexibility to adapt to changing traffic load. 192 o Tunnels provide a direct connection between B4 and the AFTR. This 193 can be leverage to enable customers and their applications to 194 control how the NATing function of the AFTR is performed. 196 A key characteristic of this approach is that communications between 197 end-nodes stay within their address family. IPv6 sources only 198 communicate with IPv6 destinations, IPv4 sources only communicate 199 with IPv4 destinations. There is no protocol family translation 200 involved in this approach. This simplifies greatly the task of 201 applications that may carry literal IP addresses in their payload. 202 Using DS-Lite, they will not have to include special knowledge to 203 deal with possibly presence of a protocol family translator is in the 204 path... 206 4.2. Home gateway 208 This section describes home style networks characterized by the 209 presence of a home gateway provisioned only with IPv6 by the service 210 provider. 212 A DS-Lite home gateway is an IPv6 aware home gateway with a B4 213 Interface implemented in the WAN interface. 215 A DS-Lite home gateway SHOULD NOT operate a NAT function on a B4 216 interface, as the NAT function will be performed by the AFTR in the 217 service provider's network. That will avoid accidentally operating 218 in a double NAT environment. 220 However, it SHOULD operate its own DHCP(v4) server handing out 221 [RFC1918] address space (e.g. 192.168.0.0/16) to hosts in the home. 222 It SHOULD advertise itself as the default IPv4 router to those home 223 hosts. It SHOULD also advertise itself as a DNS server in the DHCP 224 Option 6 (DNS Server). Additionally, it SHOULD operate a DNS proxy 225 to accept DNS IPv4 requests from home hosts and send them using IPv6 226 to the service provider DNS servers, as described in Section 5.5. 228 Note: if an IPv4 home hosts decides to use another IPv4 DNS server, 229 the DS-Lite home gateway will forward those DNS requests via the B4 230 interface, the same way it is forwarding any regular IPv4 packets. 232 IPv6 capable devices directly reach the IPv6 Internet. Packets 233 simply follow IPv6 routing, they do not go through the tunnel, and 234 are not subject to any translation. It is expected that most IPv6 235 capable devices will also be IPv4 capable and will simply be 236 configured with an IPv4 RFC1918 style address within the home network 237 and access the IPv4 Internet the same way as the legacy IPv4-only 238 devices within the home. 240 Pure IPv6-only devices (i.e. devices that do not include an IPv4 241 stack) are outside of the scope of this document. 243 4.3. Directly connected device 245 In broadband home networks, sometime devices are directly connected 246 to the broadband service provider. They are connected straight to a 247 modem, without home gateway. This scenario is identical to wireless 248 devices directly connected over the air interface to their provider. 250 Under this scenario, the customer device is a dual-stack capable host 251 that is only provisioned by the service provider only with IPv6. The 252 device itself acts as a B4 element and the IPv4 service is provided 253 by an IPv4-in-IPv6 tunnel, just as in the home gateway case. That 254 device can run any combinations of IPv4 and/or IPv6 applications. 256 A directly connected DS-Lite device SHOULD send its DNS requests over 257 IPv6 to the IPv6 DNS server it has been configured to use. 259 Similarly to the previous sections, IPv6 packets follow IPv6 routing, 260 they do not go through the tunnel, and are not subject to any 261 translation. 263 The support of IPv4-only devices and IPv6-only devices in this 264 scenario is out of scope for this document. 266 5. B4 element 268 5.1. Definition 270 The B4 element is a function implemented on a dual-stack capable 271 node, either a directly connected device or a home gateway, that 272 creates a tunnel to an AFTR. 274 5.2. Encapsulation 276 The tunnel is a multi-point to point IPv4-in-IPv6 tunnel ending on a 277 service provider AFTR. 279 See section 7.1 for additional tunneling considerations. 281 Note: at this point, DS-Lite only defines IPv4-in-IPv6 tunnels, 282 however other types of encapsulation could be defined in the future. 284 5.3. Fragmentation and Reassembly 286 Using an encapsulation (IPv4-in-IPv6 or anything else) to carry IPv4 287 traffic over IPv6 will reduce the effective MTU of the datagram. 288 Unfortunately, path MTU discovery [RFC1191] is not a reliable method 289 to deal with this problem. 291 A solution to deal with this problem is for the service provider to 292 increase the MTU size of all the links between the B4 element and the 293 AFTR elements by at least 40 bytes to accommodate both the IPv6 294 encapsulation header and the IPv4 datagram without fragmenting the 295 IPv6 packet. 297 However,as not all service provider will be able to increase their 298 link MTU, the B4 element MUST perform fragmentation and reassembly if 299 the outgoing link MTU cannot accommodate for the extra IPv6 header. 300 Fragmentation MUST happen after the encapsulation on the IPv6 packet. 301 Reassembly MUST happen before the decapsulation of the IPv6 header. 302 Detailed procedure has been specified in [RFC2473] Section 7.2. 304 5.4. AFTR discovery 306 In order to configure the IPv4-in-IPv6 tunnel, the B4 element needs 307 the IPv6 address of the AFTR element. This IPv6 address can be 308 configured using a variety of methods, ranging from an out-of-band 309 mechanism, manual configuration or a variety of DHCPv6 options. 311 In order to guarantee interoperability, a B4 element SHOULD implement 312 the DHCPv6 option defined in 313 [I-D.ietf-softwire-ds-lite-tunnel-option]. 315 5.5. DNS 317 A B4 element is only configured from the service provider with IPv6. 318 As such, it can only learn the address of a DNS recursive server 319 through DHCPv6 (or other similar method over IPv6). As DHCPv6 only 320 defines an option to get the IPv6 address of such a DNS recursive 321 server, the B4 element cannot easily discover the IPv4 address of 322 such a recursive DNS server, and as such will have to perform all DNS 323 resolution over IPv6. 325 The B4 element can pass this IPv6 address to downstream IPv6 nodes, 326 but not to downstream IPv4 nodes. As such, the B4 element MUST 327 implement a DNS proxy, following the recommendations of [RFC5625]. 329 5.6. Interface initialization 331 Initialization of the interface including a B4 element is out-of- 332 scope in this specification. 334 5.7. Well-known IPv4 address 336 Any locally unique IPv4 address could be configured on the IPv4-in- 337 IPv6 tunnel to represent the B4 element. Configuring such an address 338 is often necessary when the B4 element is sourcing IPv4 datagrams 339 directly over the tunnel. In order to avoid conflicts with any other 340 address, IANA has defined a well-known range, 192.0.0.0/29. 342 192.0.0.0 is the reserved subnet address. 192.0.0.1 is reserved for 343 the AFTR element. The B4 element SHOULD use any other addresses 344 within the 192.0.0.0/29 range. 346 Note: a range of addresses has been reserved for this purpose. The 347 intend is to accommodate for nodes implementing several B4 348 elements... The mechanisms to decide which of those addresses to use 349 on a B4 element is implementation dependant and out of scope for this 350 document. 352 6. AFTR element 354 6.1. Definition 356 An AFTR element is the combination of an IPv4-in-IPv6 tunnel end- 357 point and an IPv4-IPv4 NAT implemented on the same node. 359 6.2. Encapsulation 361 The tunnel is a point-to-multipoint IPv4-in-IPv6 tunnel ending at the 362 service provider subscribers B4 elements. 364 See section 7.1 for additional tunneling considerations. 366 Note: at this point, DS-Lite only defines IPv4-in-IPv6 tunnels, 367 however other types of encapsulation could be defined in the future. 369 6.3. Fragmentation and Reassembly 371 As noted previously, fragmentation and reassembly need to be taken 372 care of by the tunnel end-points. As such, the AFTR MUST perform 373 fragmentation and reassembly if the underlying link MTU cannot 374 accommodate for the extra IPv6 header of the tunnel. Fragmentation 375 MUST happen after the encapsulation on the IPv6 packet. Reassembly 376 MUST happen before the decapsulation of the IPv6 header. Detailed 377 procedure has been specified in [RFC2473] Section 7.2. 379 Fragmentation at the Tunnel Entry-Point is a light-weighted 380 operation. In contrast, reassembly at the Tunnel Exit-Point can be 381 expensive. When the Tunnel Exit-Point receives the first fragmented 382 packet, it must wait for the second fragmented packet to arrive in 383 order to reassemble the two fragmented IPv6 packets for 384 decapsulation. This requires the Tunnel Exit-Point to buffer and 385 keep track of fragmented packets. Consider that the AFTR is the 386 Tunnel Exit-Point for many tunnels. If many clients simultaneously 387 source large number of fragmented packets to the AFTR, this will 388 demand the AFTR to buffer and consume enormous resources to keep 389 track of the flows. This reassembly process will significantly 390 impact the AFTR performance. However, this impact only happens when 391 many clients simultaneously source large IPv4 packets. Since we 392 believe that majority of the clients will receive large IPv4 packets 393 (such as watching video streams) instead of sourcing large IPv4 394 packets (such as sourcing video streams), so reassembly is only a 395 fraction of the overall AFTR's workload. 397 Other methods to avoid fragmentation, such as rewriting the TCP MSS 398 option or using technologies such as Subnetwork Encapsulation and 399 Adaptation Layer defined in [I-D.templin-seal] are out of scope for 400 this document. 402 6.4. DNS 404 As noted previously, DS-Lite node implementing a B4 elements will 405 perform DNS resolution over IPv6. As such, very few, if any, DNS 406 traffic will flow through the AFTR element. 408 6.5. Well-known IPv4 address 410 The AFTR MAY use the well-know IPv4 address 192.0.0.1 reserved by 411 IANA to configure the IPv4-in-IPv6 tunnel. That address can then be 412 used to report ICMP problems and will appear in traceroute outputs. 414 6.6. Extended binding table 416 The NAT binding table of the AFTR element is extended to include the 417 source IPv6 address of the incoming packets. This IPv6 address will 418 disambiguate between the overlapping IPv4 address space of the 419 service provider customers. 421 By doing a reverse look-up in the extended IPv4 NAT binding table, 422 the AFTR knows how to reconstruct the IPv6 encapsulation when the 423 packets comes back from the Internet. That way, there is no need to 424 keep a static configuration for each tunnel. 426 7. Network Considerations 428 7.1. Tunneling 430 Tunneling MUST be done in accordance to [RFC2473] and [RFC4213]. 431 Traffic classes ([RFC2474]) from the IPv4 headers SHOULD be carried 432 over to the IPv6 headers and vice versa. 434 7.2. VPN 436 The combination of the dual-stack lite technology with either IPv4 437 VPNs or IPv6 VPNs is out of scope for this document. 439 7.3. Multicast considerations 441 Multicast is out-of-scope in this document. 443 8. NAT considerations 445 8.1. NAT pool 447 It is expected that AFTRs will operate distinct, non overlapping NAT 448 pools. However, those NAT pools do not have to be continuous. 450 8.2. NAT conformance 452 A dual-stack lite AFTR SHOULD implement behavior conforming to the 453 best current practice, currently documented in [RFC4787], [RFC5382] 454 and [RFC5508]. Other requirements for AFTRs can be found in 455 [I-D.nishitani-cgn]. 457 8.3. ALG 459 The AFTR should only perform a minimum number of ALG for the classic 460 applications such as FTP, RTSP/RTP, IPsec and PPTP VPN pass-through 461 and enable the users to use their own ALG on statically or 462 dynamically reserved port instead. 464 8.4. Port allocation 466 8.4.1. How many ports per customers? 468 Because IPv4 addresses will be shared among customers and potentially 469 a large address space reduction factor may be applied, in average, 470 only a limited number N of TCP or UDP port numbers will be available 471 per customer. This means that applications opening a very large 472 number of TCP ports may have a harder time to work. For example, it 473 has been reported that a very well know web site was using AJAX 474 techniques and was opening up to 69 TCP ports per web page. If we 475 make the hypothesis of an address space reduction of a factor 100 476 (one IPv4 address per 100 customers), and 65k ports per IPv4 477 addresses available, that makes an average of N = 650 ports available 478 simultaneously to be shared among the various devices behind the 479 dual-stack lite tunnel end-point. 481 There is an important operational difference if those N ports are 482 pre-allocated in a cookie-cutter fashion versus allocated on demand 483 by incoming connections. This is a difference between an average of 484 N ports and a maximum of N ports. Several service providers have 485 reported an average number of connections per customer in the single 486 digit. At the opposite end, thousands or tens of thousands of ports 487 could be use in a peak by any single customer browsing a number of 488 AJAX/Web 2.0 sites. 490 As such, service provider allocating a fixed number of ports per user 491 should dimension the system with a minimum of N = several thousands 492 of ports for every user. This would bring the address space 493 reduction ratio to a single digit. Service providers using a smaller 494 number of ports per user (N in the hundreds) should expect customers 495 applications to break in a more or less random way over time. 497 In order to achieve higher address space reduction ratios, it is 498 recommended that service provider do not use this cookie-cutter 499 approach, and, on the contrary, allocate ports as dynamically as 500 possible, just like on a regular NAT. With an average number of 501 connections per customers in the single digit, having an address 502 space reduction of a factor 100 is realistic. However, service 503 providers should exercise caution and make sure their pool of port 504 numbers does not go too low. The actual maximum address space 505 reduction factor is unknown at this time. 507 8.4.2. Dynamic port assignment considerations 509 When dynamic port assignment is used to maximize the number of 510 subscriber sharing each AFTR global IPv4 address, the should 511 implement checks to avoid DOS attack through exhaustion of available 512 ports. It should also avoid mapping any one subscriber's "flows" 513 across more than one global IPv4 address. 515 8.4.3. Subscriber controlled port assignment 517 Dynamic port assignment precludes inbound access to subscriber 518 servers, just as in a home gateway NAT. Inbound access to subscriber 519 servers can be provided through pre-assigned and/or reserved port 520 mappings in the AFTR. Specifying the mechanisms for managing and 521 signaling these reserved port mappings is out of scope for this 522 document, however some techniques are mentioned in appendix A as 523 examples. 525 8.5. Other considerations about sharing global IPv4 addresses 527 More considerations on sharing the port space of IPv4 addresses can 528 be found in [I-D.ford-shared-addressing-issues]. 530 9. Acknowledgements 532 The authors would like to acknowledge the role of Mark Townsley for 533 his input on the overall architecture of this technology by pointing 534 this work in the direction of [I-D.droms-softwires-snat]. Note that 535 this document results from a merging of [I-D.durand-dual-stack-lite] 536 and [I-D.droms-softwires-snat].Also to be acknowledged are the many 537 discussions with a number of people including Shin Miyakawa, 538 Katsuyasu Toyama, Akihide Hiura, Takashi Uematsu, Tetsutaro Hara, 539 Yasunori Matsubayashi, Ichiro Mizukoshi. The author would also like 540 to thank David Ward, Jari Arkko, Thomas Narten and Geoff Huston for 541 their constructive feedbacks. Special thanks go to Dave Thaler and 542 Dan Wing for their reviews and comments. 544 10. IANA Considerations 546 This draft request IANA to allocate a well know IPv4 192.0.0.0/29 547 network prefix. That range is used to number the dual-stack lite 548 interfaces. Reserving a /29 allows for 6 possible interfaces on a 549 multi-home node. The IPv4 address 192.0.0.1 is reserved as the IPv4 550 address of the default router for such dual-stack lite hosts. 552 11. Security Considerations 554 Security issues associated with NAT have long been documented. See 555 [RFC2663] and [RFC2993]. 557 However, moving the NAT functionality from the home gateway to the 558 core of the service provider network and sharing IPv4 addresses among 559 customers create additional requirements when logging data for abuse 560 usage. With any architecture where an IPv4 address does not uniquely 561 represent an end host, IPv4 addresses and a timestamps are no longer 562 sufficient to identify a particular broadband customer. Additional 563 information such as transport protocol information will be required 564 for that purpose. For example, we suggest to log the transport port 565 number for TCP and UDP connections. 567 The AFTR performs translation functions for interior IPv4 hosts at 568 RFC 1918 addresses or at the IANA reserved address range (TBA by 569 IANA). If the interior host is properly using the authorized IPv4 570 address with the authorized transport protocol port range such as A+P 571 semantic for the tunnel, the AFTR can simply forward without 572 translation to permit the authorized address and port range to 573 function properly. All packets with unauthorized interior IPv4 574 addresses or with authorized interior IPv4 address but unauthorized 575 port range MUST NOT be forwarded by the AFTR. This prevents rogue 576 devices from launching denial of service attacks using unauthorized 577 public IPv4 addresses in the IPv4 source header field or unauthorized 578 transport port range in the IPv4 transport header field. For 579 example, rogue devices could bombard a public web server by launching 580 TCP SYN ACK attack. The victim will receive TCP SYN from random IPv4 581 source addresses at a rapid rate and deny TCP services to legitimate 582 users. 584 With IPv4 addresses shared by multiple users, ports become a critical 585 resource. As such, some mechanisms need to be put in place by an 586 AFTR to limit port usage, either by rate-limiting new connections or 587 putting a hard limit on the maximum number of port usable by single 588 user. If this number is high enough, it should not interfere with 589 normal usage and still provide reasonable protection of the shared 590 pool. More considerations on ports allocation and port exhaustion 591 can be found in section 8.4. 593 More considerations on sharing IPv4 addresses can be found in 594 "I-D.ford-shared-addressing-issues". 596 AFTRs should support ways to limit service to registered customers. 597 If strict IPv6 ingress filtering is deployed in the broadband network 598 to prevent IPv6 address spoofing and dual-stack lite service is 599 restricted to those customers, then tunnels terminating at the AFTR 600 and coming from registered customer IPv6 addresses cannot be spoofed. 601 Thus a simple access control list on the tunnel transport source 602 address is all what is required to accept traffic on the southbound 603 interface of an AFTR. 605 If IPv6 address spoofing prevention is not in place, the AFTR should 606 perform further sanity checks on the IPv6 address of incoming IPv6 607 packets. For example, it should check if the address has really been 608 allocated to an authorized customer. 610 12. Author's Addresses 612 This document is the result of the work of the following authors: 614 Alain Durand 615 Comcast 616 1, Comcast center 617 Philadelphia, PA 19103 618 USA 619 Email: alain_durand@cable.comcast.com 621 Ralph Droms 622 Cisco 623 1414 Massachusetts Avenue 624 Boxborough, MA 01714 625 USA 626 Phone: +1 978.936.1674 627 Email: rdroms@cisco.com 629 Brian Haberman 630 Johns Hopkins University Applied Physics Lab 631 11100 Johns Hopkins Road 632 Laurel, MD 20723-6099 633 USA 634 Phone: +1 443 778 1319 635 Email: brian@innovationslab.net 636 James Woodyatt 637 Apple Inc. 638 1 Infinite Loop 639 Cupertino, CA 95014 640 USA 641 Email: jhw@apple.com 643 Yiu Lee 644 Comcast 645 1, Comcast center 646 Philadelphia, PA 19103 647 USA 648 Email: yiu_lee@cable.comcast.com 650 Randy Bush 651 Internet Initiative Japan 652 5147 Crystal Springs 653 Bainbridge Island, Washington 98110 654 USA 655 Phone: +1 206 780 0431 x1 656 Email: randy@psg.com 658 13. Appendix A: future DS-Lite extensions 660 Techniques discussed bellow are not part of the core dual-stack lite 661 specification and will be developed in separate documents. They are 662 only listed here as examples. 664 Application expecting incoming connections, such a peer-to-peer ones, 665 have become popular. Those applications use a very limited number of 666 ports, usually a single one. Making sure those applications keep 667 working in a dual-stack lite environment is important. Similarly, 668 there is a growing list of applications that require some king of ALG 669 to work through a NAT. Service provider AFTRs should not to be in 670 the way of the deployment of such applications. As such, there is a 671 legitimate need to leave certain ports under the control of the end 672 user. This argue for an hybrid environment, where most ports are 673 dynamically managed by the AFTR in a shared pool and a limited number 674 are dedicated per users and controlled by them. 676 13.1. Static port reservation 678 A service provider can reserve a static number of ports per user. 679 Note: those could be TCP and/or UDP ports. The simplest model to 680 allow users to control the associated NAT bindings is to offer a web 681 interface (for example as part of the service provider portal) where, 682 once authenticated, a user can configure each dedicated external IPv4 683 address/port binding on the AFTR either using the port forwarding 684 semantic or the A+P semantic. 686 Note: The exact number of ports reserved per user is left at the 687 discretion of the service provider. 689 13.1.1. Port forwarding model 691 In this model, the subscriber directs the AFTR to rewrite the 692 destination address in those incoming packets to a private IPv4 693 address within the home network. For obvious security reasons, 694 redirection to global IPv4 address should not be authorized. Note: 695 this behavior is very similar to the port forwarding function found 696 in most home gateways. 698 13.1.2. A+P model 700 The subscriber directs the AFTR to forward incoming traffic on a 701 given address/port to the dual-stack lite home gateway, and let this 702 device deal with it. This required support for A+P [I-D.ymbk-aplusp] 703 semantic on both the AFTR and on the home gateway. 705 In particular, an A+P aware home router can locally NAT A+P packets 706 to and from internal hosts. Alternatively, it can forward directly 707 the traffic to those hosts if they are configured, for example, with 708 A+P secondary address and ports. 710 An AFTR forwards packets in the A+P range directly to and from the 711 tunnels without NAT. 713 13.2. Dynamic port reservation 715 13.2.1. UPnP 717 A B4 element can act as a UPnP relay, forwarding UPnP messages over 718 the tunnel to the AFTR. This may work in some cases, but not all the 719 time. Some applications insist on running on a well-known port 720 number (or port range) using UPnP to request the NAT to reserve that 721 port. Those ports may or may not be available; they could be used by 722 another customer. Using UPnP, a NAT box does not have any way to 723 redirect such applications to use another port, the only option is to 724 deny the request. Those applications typically then cycle through a 725 small range of ports (typically 10 or so) until they abort. The 726 likelihood of those ports being all already in use by other users is 727 an inverse function of the address space reduction, ie, how many 728 users are sharing the same address. 730 Note: the UPnP forum has been reported to address this issue in an 731 upcoming version of the IGD profile. 733 13.2.2. NAT-PMP 735 NAT-PMP [I-D.cheshire-nat-pmp] offers a better semantic, by enabling 736 the NAT to redirect the application to use another unallocated port. 737 A B4 element could proxy the NAT-PMP messages to the AFTR through the 738 tunnel. 740 13.2.3. DHCPv6 742 If more ports need to be reserved outside of that static dedicated 743 range, a DHCPv6 option such as 744 [I-D.bajko-v6ops-port-restricted-ipaddr-assign] may also be an 745 interesting approach. This may be limited to the A+P semantic 746 mentioned above, as there might not be a way to explicitly control 747 the port forwarding semantic. Also, there are concerns that this 748 would lead to a cookie cutter distribution of ports per customers, 749 dramatically reducing the ratio of customer per IPv4 address. 751 14. Appendix B: Examples 753 14.1. Gateway based architecture 755 This architecture is targeted at residential broadband deployments 756 but can be adapted easily to other types of deployment where the 757 installed base of IPv4-only device is important. 759 Consider a scenario where a Dual-Stack lite home gateway is 760 provisioned only with IPv6 in the WAN port, no IPv4. The home 761 gateway acts as an IPv4 DCHP server for the LAN network (wireline and 762 wireless) handing out RFC1918 addresses. In addition, the home 763 gateway may support IPv6 Auto-Configuration and/or DHCPv6 server for 764 the LAN network. When an IPv4-only device connects to the home 765 gateway, the gateway will hand it out a RFC1918 address. When a 766 dual-stack capable device connects to the home gateway, the gateway 767 will hand out a RFC1918 address and a global IPv6 address to the 768 device. Besides, the home gateway will create an IPv4-in-IPv6 769 softwire tunnel [RFC5571]to an AFTR that resides in the service 770 provider network. 772 When the device accesses IPv6 service, it will send the IPv6 datagram 773 to the home gateway natively. The home gateway will route the 774 traffic upstream to the default gateway. 776 When the device accesses IPv4 service, it will source the IPv4 777 datagram with the RFC1918 address and send the IPv4 datagram to the 778 home gateway. The home gateway will encapsulate the IPv4 datagram 779 inside the IPv4-in-IPv6 softwire tunnel and forward the IPv6 datagram 780 to the AFTR. This contrasts what the home gateways normally do today 781 which will NAT the RFC1918 address to the public IPv4 address and 782 route the datagram upstream. When the AFTR receives the IPv6 783 datagram, it will decapsulate the IPv6 header and perform an IPv4-to- 784 IPv4 NAT on the source address. 786 As illustrated in Figure 1, this dual-stack lite deployment model 787 consists of three components: the dual-stack lite home router with a 788 B4 element, the AFTR and a softwire between the B4 element acting as 789 softwire initiator (SI) [RFC5571] in the dual-stack lite home router 790 and the softwire concentrator (SC) [RFC5571] in the AFTR. The AFTR 791 performs IPv4-IPv4 NAT translations to multiplex multiple subscribers 792 through a pool of global IPv4 address. Overlapping address spaces 793 used by subscribers are disambiguated through the identification of 794 tunnel endpoints. 796 +-----------+ 797 | Host | 798 +-----+-----+ 799 |10.0.0.1 800 | 801 | 802 |10.0.0.2 803 +---------|---------+ 804 | | | 805 | Home router | 806 |+--------+--------+| 807 || B4 || 808 |+--------+--------+| 809 +--------|||--------+ 810 |||2001:0:0:1::1 811 ||| 812 |||<-IPv4-in-IPv6 softwire 813 ||| 814 -------|||------- 815 / ||| \ 816 | ISP core network | 817 \ ||| / 818 -------|||------- 819 ||| 820 |||2001:0:0:2::1 821 +--------|||--------+ 822 | AFTR | 823 |+--------+--------+| 824 || Concentrator || 825 |+--------+--------+| 826 | |NAT| | 827 | +-+-+ | 828 +---------|---------+ 829 |129.0.0.1 830 | 831 --------|-------- 832 / | \ 833 | Internet | 834 \ | / 835 --------|-------- 836 | 837 |128.0.0.1 838 +-----+-----+ 839 | IPv4 Host | 840 +-----------+ 842 Figure 1: gateway-based architecture 844 Notes: 846 o The dual-stack lite home router is not required to be on the same 847 link as the host 849 o The dual-stack lite home router could be replaced by a dual-stack 850 lite router in the service provider network 852 The resulting solution accepts an IPv4 datagram that is translated 853 into an IPv4-in-IPv6 softwire datagram for transmission across the 854 softwire. At the corresponding endpoint, the IPv4 datagram is 855 decapsulated, and the translated IPv4 address is inserted based on a 856 translation from the softwire. 858 14.1.1. Example message flow 860 In the example shown in Figure 2, the translation tables in the AFTR 861 is configured to forward between IP/TCP (10.0.0.1/10000) and IP/TCP 862 (129.0.0.1/5000). That is, a datagram received by the dual-stack 863 lite home router from the host at address 10.0.0.1, using TCP DST 864 port 10000 will be translated a datagram with IP SRC address 865 129.0.0.1 and TCP SRC port 5000 in the Internet. 867 +-----------+ 868 | Host | 869 +-----+-----+ 870 | |10.0.0.1 871 IPv4 datagram 1 | | 872 | | 873 v |10.0.0.2 874 +---------|---------+ 875 | | | 876 | home router | 877 |+--------+--------+| 878 || B4 || 879 |+--------+--------+| 880 +--------|||--------+ 881 | |||2001:0:0:1::1 882 IPv6 datagram 2| ||| 883 | |||<-IPv4-in-IPv6 softwire 884 | ||| 885 -----|-|||------- 886 / | ||| \ 887 | ISP core network | 888 \ | ||| / 889 -----|-|||------- 890 | ||| 891 | |||2001:0:0:2::1 892 +------|-|||--------+ 893 | | AFTR | 894 | v ||| | 895 |+--------+--------+| 896 || Concentrartor || 897 |+--------+--------+| 898 | |NAT| | 899 | +-+-+ | 900 +---------|---------+ 901 | |129.0.0.1 902 IPv4 datagram 3 | | 903 | | 904 -----|--|-------- 905 / | | \ 906 | Internet | 907 \ | | / 908 -----|--|-------- 909 | | 910 v |128.0.0.1 911 +-----+-----+ 912 | IPv4 Host | 913 +-----------+ 914 Figure 2: Outbound Datagram 916 +-----------------+--------------+---------------+ 917 | Datagram | Header field | Contents | 918 +-----------------+--------------+---------------+ 919 | IPv4 datagram 1 | IPv4 Dst | 128.0.0.1 | 920 | | IPv4 Src | 10.0.0.1 | 921 | | TCP Dst | 80 | 922 | | TCP Src | 10000 | 923 | --------------- | ------------ | ------------- | 924 | IPv6 Datagram 2 | IPv6 Dst | 2001:0:0:2::1 | 925 | | IPv6 Src | 2001:0:0:1::1 | 926 | | IPv4 Dst | 128.0.0.1 | 927 | | IPv4 Src | 10.0.0.1 | 928 | | TCP Dst | 80 | 929 | | TCP Src | 10000 | 930 | --------------- | ------------ | ------------- | 931 | IPv4 datagram 3 | IPv4 Dst | 128.0.0.1 | 932 | | IPv4 Src | 129.0.0.1 | 933 | | TCP Dst | 80 | 934 | | TCP Src | 5000 | 935 +-----------------+--------------+---------------+ 937 Datagram header contents 939 When datagram 1 is received by the dual-stack lite home router, the 940 B4 function encapsulates the datagram in datagram 2 and forwards it 941 to the dual-stack lite carrier-grade NAT over the softwire. 943 When it receives datagram 2, the tunnel concentrator in the AFTR 944 hands the IPv4 datagram to the NAT, which determines from its 945 translation table that the datagram received on Softwire_1 with TCP 946 SRC port 10000 should be translated to datagram 3 with IP SRC address 947 129.0.0.1 and TCP SRC port 5000. 949 Figure 3 shows an inbound message received at the AFTR. When the NAT 950 function in the AFTR receives datagram 1, it looks up the IP/TCP DST 951 in its translation table. In the example in Figure 3, the NAT 952 translates the TCP DST port to 10000, sets the IP DST address to 953 10.0.0.1 and hands the datagram to the SC for transmission over 954 Softwire_1. The B4 in the home router decapsulates IPv4 datagram 955 from the inbound softwire datagram, and forwards it to the host. 957 +-----------+ 958 | Host | 959 +-----+-----+ 960 ^ |10.0.0.1 961 IPv4 datagram 3 | | 962 | | 963 | |10.0.0.2 964 +---------|---------+ 965 | +-+-+ | 966 | home router | 967 |+--------+--------+| 968 || B4 || 969 |+--------+--------+| 970 +--------|||--------+ 971 ^ |||2001:0:0:1::1 972 IPv6 datagram 2 | ||| 973 | |||<-IPv4-in-IPv6 softwire 974 | ||| 975 -----|-|||------- 976 / | ||| \ 977 | ISP core network | 978 \ | ||| / 979 -----|-|||------- 980 | ||| 981 | |||2001:0:0:2::1 982 +------|-|||--------+ 983 | AFTR | 984 |+--------+--------+| 985 || Concentrator || 986 |+--------+--------+| 987 | |NAT| | 988 | +-+-+ | 989 +---------|---------+ 990 ^ |129.0.0.1 991 IPv4 datagram 1 | | 992 | | 993 -----|--|-------- 994 / | | \ 995 | Internet | 996 \ | | / 997 -----|--|-------- 998 | | 999 | |128.0.0.1 1000 +-----+-----+ 1001 | IPv4 Host | 1002 +-----------+ 1004 Figure 3: Inbound Datagram 1006 +-----------------+--------------+---------------+ 1007 | Datagram | Header field | Contents | 1008 +-----------------+--------------+---------------+ 1009 | IPv4 datagram 1 | IPv4 Dst | 129.0.0.1 | 1010 | | IPv4 Src | 128.0.0.1 | 1011 | | TCP Dst | 5000 | 1012 | | TCP Src | 80 | 1013 | --------------- | ------------ | ------------- | 1014 | IPv6 Datagram 2 | IPv6 Dst | 2001:0:0:1::1 | 1015 | | IPv6 Src | 2001:0:0:2::1 | 1016 | | IPv4 Dst | 10.0.0.1 | 1017 | | IP Src | 128.0.0.1 | 1018 | | TCP Dst | 10000 | 1019 | | TCP Src | 80 | 1020 | --------------- | ------------ | ------------- | 1021 | IPv4 datagram 3 | IPv4 Dst | 10.0.0.1 | 1022 | | IPv4 Src | 128.0.0.1 | 1023 | | TCP Dst | 10000 | 1024 | | TCP Src | 80 | 1025 +-----------------+--------------+---------------+ 1027 Datagram header contents 1029 14.1.2. Translation details 1031 The AFTR has a NAT that translates between softwire/port pairs and 1032 IPv4-address/port pairs. The same translation is applied to IPv4 1033 datagrams received on the device's external interface and from the 1034 softwire endpoint in the device. 1036 In Figure 2, the translator network interface in the AFTR is on the 1037 Internet, and the softwire interface connects to the dual-stack lite 1038 home router. The AFTR translator is configured as follows: 1040 Network interface: Translate IPv4 destination address and TCP 1041 destination port to the softwire identifier and TCP destination 1042 port 1044 Softwire interface: Translate softwire identifier and TCP source 1045 port to IPv4 source address and TCP source port 1047 Here is how the translation in Figure 3 works: 1049 o Datagram 1 is received on the AFTR translator network interface. 1050 The translator looks up the IPv4-address/port pair in its 1051 translator table, rewrites the IPv4 destination address to 1052 10.0.0.1 and the TCP source port to 10000, and hands the datagram 1053 to the SE to be forwarded over the softwire. 1055 o The IPv4 datagram is received on the dual-stack lite home router 1056 B4. The B4 function extracts the IPv4 datagram and the dual-stack 1057 lite home router forwards datagram 3 to the host. 1059 +----------------------------------+--------------------+ 1060 | Softwire-Id/IPv4/Prot/Port | IPv4/Prot/Port | 1061 +----------------------------------+--------------------+ 1062 | 2001:0:0:1::1/10.0.0.1/TCP/10000 | 129.0.0.1/TCP/5000 | 1063 +----------------------------------+--------------------+ 1065 Dual-Stack lite carrier-grade NAT translation table 1067 The Softwire-Id is the IPv6 address assigned to the Dual-Stack lite 1068 home gateway. Hosts behind the same Dual-Stack lite home router have 1069 the same Softwire-Id. The source IPv4 is the RFC1918 addressed 1070 assigned by the Dual-Stack home router which is unique to each host 1071 behind the home gateway. The AFTR would receive packets sourced from 1072 different IPv4 addresses in the same softwire tunnel. The AFTR 1073 combines the Softwire-Id and IPv4 address/Port [Softwire-Id, IPv4+ 1074 Port] to uniquely identify the host behind the same Dual-Stack lite 1075 home router. 1077 14.2. Host based architecture 1079 This architecture is targeted at new, large scale deployments of 1080 dual-stack capable devices implementing a dual-stack lite interface. 1082 Consider a scenario where a Dual-Stack lite host device is directly 1083 connected to the service provider network. The host device is dual- 1084 stack capable but only provisioned an IPv6 global address. Besides, 1085 the host device will pre-configure a well-known IPv4 non-routable 1086 address (see IANA section). This well-known IPv4 non-routable 1087 address is similar to the 127.0.0.1 loopback address. Every host 1088 device implemented Dual-Stack lite will pre-configure the same 1089 address. This address will be used to source the IPv4 datagram when 1090 the device accesses IPv4 services. Besides, the host device will 1091 create an IPv4-in-IPv6 softwire tunnel to an AFTR. The Carrier Grade 1092 NAT will reside in the service provider network. 1094 When the device accesses IPv6 service, the device will send the IPv6 1095 datagram natively to the default gateway. 1097 When the device accesses IPv4 service, it will source the IPv4 1098 datagram with the well-known non-routable IPv4 address. Then, the 1099 host device will encapsulate the IPv4 datagram inside the IPv4-in- 1100 IPv6 softwire tunnel and send the IPv6 datagram to the AFTR. When 1101 the AFTR receives the IPv6 datagram, it will decapsulate the IPv6 1102 header and perform IPv4-to-IPv4 NAT on the source address. 1104 This scenario works on both wireline and wireless networks. A 1105 typical wireless device will connect directly to the service provider 1106 without home gateway in between. 1108 As illustrated in Figure 4, this dual-stack lite deployment model 1109 consists of three components: the dual-stack lite host, the AFTR and 1110 a softwire between the softwire initiator B4 in the host and the 1111 softwire concentrator in the AFTR. The dual-stack lite host is 1112 assumed to have IPv6 service and can exchange IPv6 traffic with the 1113 AFTR. 1115 The AFTR performs IPv4-IPv4 NAT translations to multiplex multiple 1116 subscribers through a pool of global IPv4 address. Overlapping IPv4 1117 address spaces used by the dual-stack lite hosts are disambiguated 1118 through the identification of tunnel endpoints. 1120 In this situation, the dual-stack lite host configures the IPv4 1121 address 192.0.0.2 out of the well-known range 192.0.0.0/29 (defined 1122 by IANA) on its B4 interface. It also configure the first non- 1123 reserved IPv4 address of the reserved range, 192.0.0.1 as the address 1124 of its default gateway. 1126 +-------------------+ 1127 | | 1128 | Host 192.0.0.2 | 1129 |+--------+--------+| 1130 || B4 || 1131 |+--------+--------+| 1132 +--------|||--------+ 1133 |||2001:0:0:1::1 1134 ||| 1135 |||<-IPv4-in-IPv6 softwire 1136 ||| 1137 -------|||------- 1138 / ||| \ 1139 | ISP core network | 1140 \ ||| / 1141 -------|||------- 1142 ||| 1143 |||2001:0:0:2::1 1144 +--------|||--------+ 1145 | AFTR | 1146 |+--------+--------+| 1147 || Concentrator || 1148 |+--------+--------+| 1149 | |NAT| | 1150 | +-+-+ | 1151 +---------|---------+ 1152 |129.0.0.1 1153 | 1154 --------|-------- 1155 / | \ 1156 | Internet | 1157 \ | / 1158 --------|-------- 1159 | 1160 |128.0.0.1 1161 +-----+-----+ 1162 | IPv4 Host | 1163 +-----------+ 1165 Figure 4: host-based architecture 1167 The resulting solution accepts an IPv4 datagram that is translated 1168 into an IPv4-in-IPv6 softwire datagram for transmission across the 1169 softwire. At the corresponding endpoint, the IPv4 datagram is 1170 decapsulated, and the translated IPv4 address is inserted based on a 1171 translation from the softwire. 1173 14.2.1. Example message flow 1175 In the example shown in Figure 5, the translation tables in the AFTR 1176 is configured to forward between IP/TCP (a.b.c.d/10000) and IP/TCP 1177 (129.0.0.1/5000). That is, a datagram received from the host at 1178 address 192.0.0.2, using TCP DST port 10000 will be translated a 1179 datagram with IP SRC address 129.0.0.1 and TCP SRC port 5000 in the 1180 Internet. 1182 +-------------------+ 1183 | | 1184 |Host 192.0.0.2 | 1185 |+--------+--------+| 1186 || B4 || 1187 |+--------+--------+| 1188 +--------|||--------+ 1189 | |||2001:0:0:1::1 1190 IPv6 datagram 1| ||| 1191 | |||<-IPv4-in-IPv6 softwire 1192 | ||| 1193 -----|-|||------- 1194 / | ||| \ 1195 | ISP core network | 1196 \ | ||| / 1197 -----|-|||------- 1198 | ||| 1199 | |||2001:0:0:2::1 1200 +------|-|||--------+ 1201 | | AFTR | 1202 | v ||| | 1203 |+--------+--------+| 1204 || Concentrator || 1205 |+--------+--------+| 1206 | |NAT| | 1207 | +-+-+ | 1208 +---------|---------+ 1209 | |129.0.0.1 1210 IPv4 datagram 2 | | 1211 -----|--|-------- 1212 / | | \ 1213 | Internet | 1214 \ | | / 1215 -----|--|-------- 1216 | | 1217 v |128.0.0.1 1218 +-----+-----+ 1219 | IPv4 Host | 1220 +-----------+ 1222 Figure 5: Outbound Datagram 1224 +-----------------+--------------+---------------+ 1225 | Datagram | Header field | Contents | 1226 +-----------------+--------------+---------------+ 1227 | IPv6 Datagram 1 | IPv6 Dst | 2001:0:0:2::1 | 1228 | | IPv6 Src | 2001:0:0:1::1 | 1229 | | IPv4 Dst | 128.0.0.1 | 1230 | | IPv4 Src | a.b.c.d | 1231 | | TCP Dst | 80 | 1232 | | TCP Src | 10000 | 1233 | --------------- | ------------ | ------------- | 1234 | IPv4 datagram 2 | IPv4 Dst | 128.0.0.1 | 1235 | | IPv4 Src | 129.0.0.1 | 1236 | | TCP Dst | 80 | 1237 | | TCP Src | 5000 | 1238 +-----------------+--------------+---------------+ 1240 Datagram header contents 1242 When sending an IPv4 packet, the dual-stack lite host encapsulates it 1243 in datagram 1 and forwards it to the AFTR over the softwire. 1245 When it receives datagram 1, the concentrator in the AFTR hands the 1246 IPv4 datagram to the NAT, which determines from its translation table 1247 that the datagram received on Softwire_1 with TCP SRC port 10000 1248 should be translated to datagram 3 with IP SRC address 129.0.0.1 and 1249 TCP SRC port 5000. 1251 Figure 6 shows an inbound message received at the AFTR. When the NAT 1252 function in the AFTR receives datagram 1, it looks up the IP/TCP DST 1253 in its translation table. In the example in Figure 3, the NAT 1254 translates the TCP DST port to 10000, sets the IP DST address to 1255 a.b.c.d and hands the datagram to the concentrator for transmission 1256 over Softwire_1. The B4 in the dual-stack lite hosts decapsulates 1257 IPv4 datagram from the inbound softwire datagram, and forwards it to 1258 the host. 1260 +-------------------+ 1261 | | 1262 |Host 192.0.0.2 | 1263 |+--------+--------+| 1264 || B4 || 1265 |+--------+--------+| 1266 +--------|||--------+ 1267 ^ |||2001:0:0:1::1 1268 IPv6 datagram 2 | ||| 1269 | |||<-IPv4-in-IPv6 softwire 1270 | ||| 1271 -----|-|||------- 1272 / | ||| \ 1273 | ISP core network | 1274 \ | ||| / 1275 -----|-|||------- 1276 | ||| 1277 | |||2001:0:0:2::1 1278 +------|-|||--------+ 1279 | AFTR | 1280 | | ||| | 1281 |+--------+--------+| 1282 || Concentrator || 1283 |+--------+--------+| 1284 | |NAT| | 1285 | +-+-+ | 1286 +---------|---------+ 1287 ^ |129.0.0.1 1288 IPv4 datagram 1 | | 1289 -----|--|-------- 1290 / | | \ 1291 | Internet | 1292 \ | | / 1293 -----|--|-------- 1294 | | 1295 | |128.0.0.1 1296 +-----+-----+ 1297 | IPv4 Host | 1298 +-----------+ 1300 Figure 6: Inbound Datagram 1302 +-----------------+--------------+---------------+ 1303 | Datagram | Header field | Contents | 1304 +-----------------+--------------+---------------+ 1305 | IPv4 datagram 1 | IPv4 Dst | 129.0.0.1 | 1306 | | IPv4 Src | 128.0.0.1 | 1307 | | TCP Dst | 5000 | 1308 | | TCP Src | 80 | 1309 | --------------- | ------------ | ------------- | 1310 | IPv6 Datagram 2 | IPv6 Dst | 2001:0:0:1::1 | 1311 | | IPv6 Src | 2001:0:0:2::1 | 1312 | | IPv4 Dst | a.b.c.d | 1313 | | IP Src | 128.0.0.1 | 1314 | | TCP Dst | 10000 | 1315 | | TCP Src | 80 | 1316 +-----------------+--------------+---------------+ 1318 Datagram header contents 1320 14.2.2. Translation details 1322 The translations happening in the AFTR are the same as in the 1323 previous examples. The well known IPv4 address 192.0.0.2 out of the 1324 192.0.0.0/29 (defined by IANA) range used by all the hosts are 1325 disambiguated by the IPv6 source address of the softwire. 1327 +---------------------------------+--------------------+ 1328 | Softwire-Id/IPv4/Prot/Port | IPv4/Prot/Port | 1329 +---------------------------------+--------------------+ 1330 | 2001:0:0:1::1/a.b.c.d/TCP/10000 | 129.0.0.1/TCP/5000 | 1331 +---------------------------------+--------------------+ 1333 Dual-Stack lite carrier-grade NAT translation table 1335 The Softwire-Id is the IPv6 address assigned to the Dual-Stack host. 1336 Each host has an unique Softwire-Id. The source IPv4 address is one 1337 of the well-known IPv4 address. The AFTR could receive packets from 1338 different hosts sourced from the same IPv4 well-known address from 1339 different softwire tunnels. Similar to the gateway architecture, the 1340 AFTR combines the Softwire-Id and IPv4 address/Port [Softwire-Id, 1341 IPv4+Port] to uniquely identify the individual host. 1343 15. Appendix C: Deployment considerations 1345 15.1. AFTR service distribution and horizontal scaling 1347 One of the key benefits of the dual-stack lite technology lies in the 1348 fact it is tunnel based. That is, tunnel end-points may be anywhere 1349 in the service provider network. 1351 Using the DHCPv6 tunnel end-point option, service providers can 1352 create groups of users sharing the same AFTR. Those groups can be 1353 merged or divided at will. This leads to an horizontally scaled 1354 solution, where more capacity is added simply by adding more boxes. 1355 As those groups of users can evolve over time, it is best to make 1356 sure that AFTRs do not require per-user configuration in order to 1357 provide service. 1359 15.2. Horizontal scaling 1361 A service provider can start using just a few AFTR centrally located. 1362 Later, when more capacity is needed, more boxes can be added and 1363 pushed to the edges of the access network. In case of a spike of 1364 traffic, for example during the Olympic games or an important 1365 political event, capacity can be quickly added in any location of the 1366 network (tunnels can terminate anywhere) simply by splitting user 1367 groups. Extra capacity can be later removed when the traffic returns 1368 to normal by resetting the DHCPv6 tunnel end-point settings. 1370 15.3. High availability 1372 An important element in the design of the dual-stack lite technology 1373 is the simplicity of implementation on the customer side. A simple 1374 IP4-in-IPv6 tunnel and a default route over it is all is needed to 1375 get IPv4 connectivity. Dealing with high availability is the 1376 responsibility of the service provider, not the customer devices 1377 implementing dual-stack lite. As such, a single IPv6 address of the 1378 tunnel end-point is provided in the DHCPv6 option defined in 1379 [I-D.ietf-softwire-ds-lite-tunnel-option]. The service provider can 1380 use techniques such as anycast or various types of clusters to ensure 1381 availability of the IPv4 service. The exact synchronization (or lack 1382 thereof) between redundant AFTRs is out of scope for this document. 1384 16. References 1386 16.1. Normative references 1388 [I-D.ietf-softwire-ds-lite-tunnel-option] 1389 Hankins, D. and T. Mrugalski, "Dynamic Host Configuration 1390 Protocol for IPv6 (DHCPv6) Options for Dual- Stack Lite", 1391 draft-ietf-softwire-ds-lite-tunnel-option-01 (work in 1392 progress), January 2010. 1394 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1395 Requirement Levels", BCP 14, RFC 2119, March 1997. 1397 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1398 IPv6 Specification", RFC 2473, December 1998. 1400 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1401 "Definition of the Differentiated Services Field (DS 1402 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1403 December 1998. 1405 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 1406 for IPv6 Hosts and Routers", RFC 4213, October 2005. 1408 [RFC5625] Bellis, R., "DNS Proxy Implementation Guidelines", 1409 BCP 152, RFC 5625, August 2009. 1411 16.2. Informative references 1413 [I-D.bajko-v6ops-port-restricted-ipaddr-assign] 1414 Bajko, G. and T. Savolainen, "Port Restricted IP Address 1415 Assignment", 1416 draft-bajko-v6ops-port-restricted-ipaddr-assign-02 (work 1417 in progress), November 2008. 1419 [I-D.cheshire-nat-pmp] 1420 Cheshire, S., "NAT Port Mapping Protocol (NAT-PMP)", 1421 draft-cheshire-nat-pmp-03 (work in progress), April 2008. 1423 [I-D.droms-softwires-snat] 1424 Droms, R. and B. Haberman, "Softwires Network Address 1425 Translation (SNAT)", draft-droms-softwires-snat-01 (work 1426 in progress), July 2008. 1428 [I-D.durand-dual-stack-lite] 1429 Durand, A., "Dual-stack lite broadband deployments post 1430 IPv4 exhaustion", draft-durand-dual-stack-lite-00 (work in 1431 progress), July 2008. 1433 [I-D.ford-shared-addressing-issues] 1434 Ford, M., Boucadair, M., Durand, A., Levis, P., and P. 1435 Roberts, "Issues with IP Address Sharing", 1436 draft-ford-shared-addressing-issues-01 (work in progress), 1437 October 2009. 1439 [I-D.nishitani-cgn] 1440 Nishitani, T., Yamagata, I., Miyakawa, S., Nakagawa, A., 1441 and H. Ashida, "Common Functions of Large Scale NAT 1442 (LSN)", draft-nishitani-cgn-03 (work in progress), 1443 November 2009. 1445 [I-D.templin-seal] 1446 Templin, F., "The Subnetwork Encapsulation and Adaptation 1447 Layer (SEAL)", draft-templin-seal-23 (work in progress), 1448 August 2008. 1450 [I-D.ymbk-aplusp] 1451 Bush, R., "The A+P Approach to the IPv4 Address Shortage", 1452 draft-ymbk-aplusp-05 (work in progress), October 2009. 1454 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1455 November 1990. 1457 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 1458 E. Lear, "Address Allocation for Private Internets", 1459 BCP 5, RFC 1918, February 1996. 1461 [RFC2663] Srisuresh, P. and M. Holdrege, "IP Network Address 1462 Translator (NAT) Terminology and Considerations", 1463 RFC 2663, August 1999. 1465 [RFC2993] Hain, T., "Architectural Implications of NAT", RFC 2993, 1466 November 2000. 1468 [RFC4787] Audet, F. and C. Jennings, "Network Address Translation 1469 (NAT) Behavioral Requirements for Unicast UDP", BCP 127, 1470 RFC 4787, January 2007. 1472 [RFC5382] Guha, S., Biswas, K., Ford, B., Sivakumar, S., and P. 1473 Srisuresh, "NAT Behavioral Requirements for TCP", BCP 142, 1474 RFC 5382, October 2008. 1476 [RFC5508] Srisuresh, P., Ford, B., Sivakumar, S., and S. Guha, "NAT 1477 Behavioral Requirements for ICMP", BCP 148, RFC 5508, 1478 April 2009. 1480 [RFC5571] Storer, B., Pignataro, C., Dos Santos, M., Stevant, B., 1481 Toutain, L., and J. Tremblay, "Softwire Hub and Spoke 1482 Deployment Framework with Layer Two Tunneling Protocol 1483 Version 2 (L2TPv2)", RFC 5571, June 2009. 1485 [UPnP-IGD] 1486 UPnP Forum, "Universal Plug and Play Internet Gateway 1487 Device Standardized Gateway Device Protocol", 1488 September 2006, 1489 . 1491 Author's Address 1493 Alain Durand (editor) 1494 Comcast 1495 1, Comcast center 1496 Philadelphia, PA 19103 1497 USA 1499 Email: alain_durand@cable.comcast.com