idnits 2.17.1 draft-ietf-softwire-dual-stack-lite-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 15 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 8 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 11, 2010) is 5008 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-10) exists of draft-ietf-softwire-ds-lite-tunnel-option-03 == Outdated reference: A later version (-07) exists of draft-cheshire-nat-pmp-03 == Outdated reference: A later version (-10) exists of draft-ymbk-aplusp-05 Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force A. Durand 3 Internet-Draft Juniper Networks 4 Intended status: Standards Track R. Droms 5 Expires: February 12, 2011 Cisco 6 J. Woodyatt 7 Apple 8 Y. Lee 9 Comcast 10 August 11, 2010 12 Dual-Stack Lite Broadband Deployments Following IPv4 Exhaustion 13 draft-ietf-softwire-dual-stack-lite-06 15 Abstract 17 This document revisits the dual-stack model and introduces the dual- 18 stack lite technology aimed at better aligning the costs and benefits 19 of deploying IPv6 in service provider networks. Dual-stack lite 20 enables a broadband service provider to share IPv4 addresses among 21 customers by combining two well-known technologies: IP in IP (IPv4- 22 in-IPv6) and Network Address Translation (NAT). 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on February 12, 2011. 41 Copyright Notice 43 Copyright (c) 2010 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2. Requirements language . . . . . . . . . . . . . . . . . . . . 4 60 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 61 4. Deployment scenarios . . . . . . . . . . . . . . . . . . . . . 5 62 4.1. Access model . . . . . . . . . . . . . . . . . . . . . . . 5 63 4.2. CPE . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 4.3. Directly connected device . . . . . . . . . . . . . . . . 7 65 5. B4 element . . . . . . . . . . . . . . . . . . . . . . . . . . 7 66 5.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 7 67 5.2. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 7 68 5.3. Fragmentation and Reassembly . . . . . . . . . . . . . . . 7 69 5.4. AFTR discovery . . . . . . . . . . . . . . . . . . . . . . 8 70 5.5. DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 5.6. Interface initialization . . . . . . . . . . . . . . . . . 8 72 5.7. Well-known IPv4 address . . . . . . . . . . . . . . . . . 9 73 6. AFTR element . . . . . . . . . . . . . . . . . . . . . . . . . 9 74 6.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 9 75 6.2. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 9 76 6.3. Fragmentation and Reassembly . . . . . . . . . . . . . . . 9 77 6.4. DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 78 6.5. Well-known IPv4 address . . . . . . . . . . . . . . . . . 10 79 6.6. Extended binding table . . . . . . . . . . . . . . . . . . 10 80 7. Network Considerations . . . . . . . . . . . . . . . . . . . . 10 81 7.1. Tunneling . . . . . . . . . . . . . . . . . . . . . . . . 11 82 7.2. VPN . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 83 7.3. Multicast considerations . . . . . . . . . . . . . . . . . 11 84 8. NAT considerations . . . . . . . . . . . . . . . . . . . . . . 11 85 8.1. NAT pool . . . . . . . . . . . . . . . . . . . . . . . . . 11 86 8.2. NAT conformance . . . . . . . . . . . . . . . . . . . . . 11 87 8.3. Application Level Gateways (ALG) . . . . . . . . . . . . . 11 88 8.4. Port allocation . . . . . . . . . . . . . . . . . . . . . 11 89 8.4.1. How many ports per customers? . . . . . . . . . . . . 11 90 8.4.2. Dynamic port assignment considerations . . . . . . . . 12 91 8.4.3. Subscriber controlled port assignment . . . . . . . . 12 92 8.5. Other considerations about sharing global IPv4 93 addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 94 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 95 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 96 11. Security Considerations . . . . . . . . . . . . . . . . . . . 13 97 12. Appendix A: Deployment considerations . . . . . . . . . . . . 14 98 12.1. AFTR service distribution and horizontal scaling . . . . . 14 99 12.2. Horizontal scaling . . . . . . . . . . . . . . . . . . . . 15 100 12.3. High availability . . . . . . . . . . . . . . . . . . . . 15 101 12.4. Logging . . . . . . . . . . . . . . . . . . . . . . . . . 15 102 13. Appendix B: Examples . . . . . . . . . . . . . . . . . . . . . 15 103 13.1. Gateway based architecture . . . . . . . . . . . . . . . . 16 104 13.1.1. Example message flow . . . . . . . . . . . . . . . . . 18 105 13.1.2. Translation details . . . . . . . . . . . . . . . . . 22 106 13.2. Host based architecture . . . . . . . . . . . . . . . . . 23 107 13.2.1. Example message flow . . . . . . . . . . . . . . . . . 26 108 13.2.2. Translation details . . . . . . . . . . . . . . . . . 30 109 14. Appendix C: Related DS-Lite work on port management . . . . . 30 110 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 31 111 15.1. Normative references . . . . . . . . . . . . . . . . . . . 31 112 15.2. Informative references . . . . . . . . . . . . . . . . . . 32 113 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 33 115 1. Introduction 117 The common thinking for more than 10 years has been that the 118 transition to IPv6 will be based solely on the dual stack model and 119 that most things would be converted this way before we ran out of 120 IPv4. However, this has not happened. The IANA free pool of IPv4 121 addresses will be depleted soon, well before sufficient IPv6 122 deployment will exist. As a result, many IPv4 services have to 123 continue to be provided even under severely limited address space. 125 This document specifies the dual-stack lite technology which is aimed 126 at better aligning the costs and benefits in service provider 127 networks. Dual-stack lite will enable both continued support for 128 IPv4 services and incentives for the deployment of IPv6. It also de- 129 couples IPv6 deployment in the service provider network from the rest 130 of the Internet, making incremental deployment easier. 132 Dual-stack lite enables a broadband service provider to share IPv4 133 addresses among customers by combining two well-known technologies: 134 IP in IP (IPv4-in-IPv6) and NAT. 136 This document makes a distinction between a dual-stack capable and a 137 dual-stack provisioned device. The former is a device that has code 138 that implements both IPv4 and IPv6, from the network layer to the 139 applications. The latter is a similar device that has been 140 provisioned with both an IPv4 and an IPv6 address on its 141 interface(s). This document will also further refine this notion by 142 distinguishing between interfaces provisioned directly by the service 143 provider from those provisioned by the customer. 145 Pure IPv6-only devices (i.e. devices that do not include an IPv4 146 stack) are outside of the scope of this document. 148 This document will first present some deployment scenario and then 149 define the behavior of the two elements of the dual-stack lite 150 technology:the B4 and the AFTR. It will then go into networking and 151 NATing considerations. 153 2. Requirements language 155 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 156 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 157 document are to be interpreted as described in RFC 2119 [RFC2119]. 159 3. Terminology 161 The technology described in this document is known as dual-stack 162 lite. The abbreviation DS-Lite will be used along this text. 164 This document also introduces two new terms: the DS-Lite Basic 165 Bridging BroadBand element (B4) and the DS-Lite Address Family 166 Transition Router element (AFTR). 168 Dual-stack is defined in [RFC4213]. 170 NAT related terminology is defined in [RFC4787]. 172 CPE stands for Customer Premise Equipment. This is the layer 3 173 device in the customer premise that is connected to the service 174 provider network. That device is often a home gateway. However, 175 sometimes computers are directly attached to the service provider 176 network. In such cases, such computers can be viewed as CPEs as 177 well. 179 4. Deployment scenarios 181 4.1. Access model 183 Instead of relying on a cascade of NATs, the dual-stack lite model is 184 built on IPv4-in-IPv6 tunnels to cross the network to reach a 185 carrier-grade IPv4-IPv4 NAT (the AFTR) where customers will share 186 IPv4 addresses. There are numbers of benefits to this approach: 188 o This technology decouples the deployment of IPv6 in the service 189 provider network (up to the customer premise equipment or CPE) 190 from the deployment of IPv6 in the global Internet and in customer 191 applications & devices. 193 o The management of the service provider access networks is 194 simplified by leveraging the large IPv6 address space. 195 Overlapping private IPv4 address spaces are not required to 196 support very large customer bases. 198 o As tunnels can terminate anywhere in the service provider network, 199 this architecture leads itself to horizontal scaling and provides 200 great flexibility to adapt to changing traffic load. 202 o Tunnels provide a direct connection between B4 and the AFTR. This 203 can be leveraged to enable customers and their applications to 204 control how the NAT function of the AFTR is performed. 206 A key characteristic of this approach is that communications between 207 end-nodes stay within their address family. IPv6 sources only 208 communicate with IPv6 destinations, IPv4 sources only communicate 209 with IPv4 destinations. There is no protocol family translation 210 involved in this approach. This simplifies greatly the task of 211 applications that may carry literal IP addresses in their payload. 213 4.2. CPE 215 This section describes home Local Area networks characterized by the 216 presence of a home gateway, or CPE, provisioned only with IPv6 by the 217 service provider. 219 A DS-Lite CPE is an IPv6 aware CPE with a B4 Interface implemented in 220 the WAN interface. 222 A DS-Lite CPE SHOULD NOT operate a NAT function between an internal 223 interface and a B4 interface, as the NAT function will be performed 224 by the AFTR in the service provider's network. That will avoid 225 accidentally operating in a double NAT environment. 227 However, it SHOULD operate its own DHCP(v4) server handing out 228 [RFC1918] address space (e.g. 192.168.0.0/16) to hosts in the home. 229 It SHOULD advertise itself as the default IPv4 router to those home 230 hosts. It SHOULD also advertise itself as a DNS server in the DHCP 231 Option 6 (DNS Server). Additionally, it SHOULD operate a DNS proxy 232 to accept DNS IPv4 requests from home hosts and send them using IPv6 233 to the service provider DNS servers, as described in Section 5.5. 235 Note: if an IPv4 home host decides to use another IPv4 DNS server, 236 the DS-Lite CPE will forward those DNS requests via the B4 interface, 237 the same way it forwards any regular IPv4 packets. However, each DNS 238 request will create a binding in the AFTR. A large number of DNS 239 requests may have direct impact to the AFTR's NAT table utilization. 241 IPv6 capable devices directly reach the IPv6 Internet. Packets 242 simply follow IPv6 routing, they do not go through the tunnel, and 243 are not subject to any translation. It is expected that most IPv6 244 capable devices will also be IPv4 capable and will simply be 245 configured with an IPv4 RFC1918 style address within the home network 246 and access the IPv4 Internet the same way as the legacy IPv4-only 247 devices within the home. 249 Pure IPv6-only devices (i.e. devices that do not include an IPv4 250 stack) are outside of the scope of this document. 252 4.3. Directly connected device 254 In broadband home networks, sometime devices are directly connected 255 to the broadband service provider. They are connected straight to a 256 modem, without a home gateway. Those devices are, in fact, acting as 257 CPEs. 259 Under this scenario, the customer device is a dual-stack capable host 260 that is only provisioned by the service provider only with IPv6. The 261 device itself acts as a B4 element and the IPv4 service is provided 262 by an IPv4-in-IPv6 tunnel, just as in the home gateway/CPE case. 263 That device can run any combinations of IPv4 and/or IPv6 264 applications. 266 A directly connected DS-Lite device SHOULD send its DNS requests over 267 IPv6 to the IPv6 DNS server it has been configured to use. 269 Similarly to the previous sections, IPv6 packets follow IPv6 routing, 270 they do not go through the tunnel, and are not subject to any 271 translation. 273 The support of IPv4-only devices and IPv6-only devices in this 274 scenario is out of scope for this document. 276 5. B4 element 278 5.1. Definition 280 The B4 element is a function implemented on a dual-stack capable 281 node, either a directly connected device or a CPE, that creates a 282 tunnel to an AFTR. 284 5.2. Encapsulation 286 The tunnel is a multi-point to point IPv4-in-IPv6 tunnel ending on a 287 service provider AFTR. 289 See section 7.1 for additional tunneling considerations. 291 Note: at this point, DS-Lite only defines IPv4-in-IPv6 tunnels, 292 however other types of encapsulation could be defined in the future. 294 5.3. Fragmentation and Reassembly 296 Using an encapsulation (IPv4-in-IPv6 or anything else) to carry IPv4 297 traffic over IPv6 will reduce the effective MTU of the datagram. 298 Unfortunately, path MTU discovery [RFC1191] is not a reliable method 299 to deal with this problem. 301 A solution to deal with this problem is for the service provider to 302 increase the MTU size of all the links between the B4 element and the 303 AFTR elements by at least 40 bytes to accommodate both the IPv6 304 encapsulation header and the IPv4 datagram without fragmenting the 305 IPv6 packet. 307 However, as not all service providers will be able to increase their 308 link MTU, the B4 element MUST perform fragmentation and reassembly if 309 the outgoing link MTU cannot accommodate for the extra IPv6 header. 310 Fragmentation MUST happen after the encapsulation on the IPv6 packet. 311 Reassembly MUST happen before the decapsulation of the IPv6 header. 312 Detailed procedure has been specified in [RFC2473] Section 7.2. 314 5.4. AFTR discovery 316 In order to configure the IPv4-in-IPv6 tunnel, the B4 element needs 317 the IPv6 address of the AFTR element. This IPv6 address can be 318 configured using a variety of methods, ranging from an out-of-band 319 mechanism, manual configuration or a variety of DHCPv6 options. 321 In order to guarantee interoperability, a B4 element SHOULD implement 322 the DHCPv6 option defined in 323 [I-D.ietf-softwire-ds-lite-tunnel-option]. 325 5.5. DNS 327 A B4 element is only configured from the service provider with IPv6. 328 As such, it can only learn the address of a DNS recursive server 329 through DHCPv6 (or other similar method over IPv6). As DHCPv6 only 330 defines an option to get the IPv6 address of such a DNS recursive 331 server, the B4 element cannot easily discover the IPv4 address of 332 such a recursive DNS server, and as such will have to perform all DNS 333 resolution over IPv6. 335 The B4 element can pass this IPv6 address to downstream IPv6 nodes, 336 but not to downstream IPv4 nodes. As such, the B4 element SHOULD 337 implement a DNS proxy, following the recommendations of [RFC5625]. 339 5.6. Interface initialization 341 Initialization of the interface including a B4 element is out-of- 342 scope in this specification. 344 5.7. Well-known IPv4 address 346 Any locally unique IPv4 address could be configured on the IPv4-in- 347 IPv6 tunnel to represent the B4 element. Configuring such an address 348 is often necessary when the B4 element is sourcing IPv4 datagrams 349 directly over the tunnel. In order to avoid conflicts with any other 350 address, IANA has defined a well-known range, 192.0.0.0/29. 352 192.0.0.0 is the reserved subnet address. 192.0.0.1 is reserved for 353 the AFTR element. The B4 element MAY use any other addresses within 354 the 192.0.0.0/29 range. 356 Note: a range of addresses has been reserved for this purpose. The 357 intent is to accommodate nodes implementing multiple B4 elements. 359 6. AFTR element 361 6.1. Definition 363 An AFTR element is the combination of an IPv4-in-IPv6 tunnel end- 364 point and an IPv4-IPv4 NAT implemented on the same node. 366 6.2. Encapsulation 368 The tunnel is a point-to-multipoint IPv4-in-IPv6 tunnel ending at the 369 B4 elements. 371 See section 7.1 for additional tunneling considerations. 373 Note: at this point, DS-Lite only defines IPv4-in-IPv6 tunnels, 374 however other types of encapsulation could be defined in the future. 376 6.3. Fragmentation and Reassembly 378 As noted previously, fragmentation and reassembly need to be taken 379 care of by the tunnel end-points. As such, the AFTR MUST perform 380 fragmentation and reassembly if the underlying link MTU cannot 381 accommodate the extra IPv6 header of the tunnel. Fragmentation MUST 382 happen after the encapsulation on the IPv6 packet. Reassembly MUST 383 happen before the decapsulation of the IPv6 header. Detailed 384 procedure has been specified in [RFC2473] Section 7.2. 386 Fragmentation at the Tunnel Entry-Point is a light-weight operation. 387 In contrast, reassembly at the Tunnel Exit-Point can be expensive. 388 When the Tunnel Exit-Point receives the first fragmented packet, it 389 must wait for the second fragmented packet to arrive in order to 390 reassemble the two fragmented IPv6 packets for decapsulation. This 391 requires the Tunnel Exit-Point to buffer and keep track of fragmented 392 packets. Consider that the AFTR is the Tunnel Exit-Point for many 393 tunnels. If many clients simultaneously source large number of 394 fragmented packets to the AFTR, this will require the AFTR to buffer 395 and consume enormous resources to keep track of the flows. This 396 reassembly process will significantly impact the AFTR performance. 397 However, this impact only happens when many clients simultaneously 398 source large IPv4 packets. Since we believe that majority of the 399 clients will receive large IPv4 packets (such as watching video 400 streams) instead of sourcing large IPv4 packets (such as sourcing 401 video streams), so reassembly is only a fraction of the overall 402 AFTR's workload. 404 Methods to avoid fragmentation, such as rewriting the TCP MSS option 405 or using technologies such as Subnetwork Encapsulation and Adaptation 406 Layer defined in [I-D.templin-seal] are out of scope for this 407 document. 409 6.4. DNS 411 As noted previously, DS-Lite node implementing a B4 elements will 412 perform DNS resolution over IPv6. As such, very few, if any, DNS 413 packets will flow through the AFTR element. 415 6.5. Well-known IPv4 address 417 The AFTR MAY use the well-known IPv4 address 192.0.0.1 reserved by 418 IANA to configure the IPv4-in-IPv6 tunnel. That address can then be 419 used to report ICMP problems and will appear in traceroute outputs. 421 6.6. Extended binding table 423 The NAT binding table of the AFTR element is extended to include the 424 source IPv6 address of the incoming packets. This IPv6 address is 425 used to disambiguate between the overlapping IPv4 address space of 426 the service provider customers. 428 By doing a reverse look-up in the extended IPv4 NAT binding table, 429 the AFTR knows how to reconstruct the IPv6 encapsulation when the 430 packets comes back from the Internet. That way, there is no need to 431 keep a static configuration for each tunnel. 433 7. Network Considerations 434 7.1. Tunneling 436 Tunneling MUST be done in accordance to [RFC2473] and [RFC4213]. 437 Traffic classes ([RFC2474]) from the IPv4 headers SHOULD be carried 438 over to the IPv6 headers and vice versa. 440 7.2. VPN 442 Dual-stack lite implementations SHOULD NOT interfere with the 443 functioning of IPv4 or IPv6 VPNs. 445 7.3. Multicast considerations 447 Multicast is out-of-scope in this document. 449 8. NAT considerations 451 8.1. NAT pool 453 AFTRs MAY operate distinct, non overlapping NAT pools. Those NAT 454 pools do not have to be continuous. 456 8.2. NAT conformance 458 A dual-stack lite AFTR SHOULD implement behavior conforming to the 459 best current practice, currently documented in [RFC4787], [RFC5382] 460 and [RFC5508]. Other requirements for AFTRs can be found in 461 [I-D.nishitani-cgn]. 463 8.3. Application Level Gateways (ALG) 465 The AFTR should only perform a minimum number of ALG for the classic 466 applications such as FTP, RTSP/RTP, IPsec and PPTP VPN pass-through 467 and enable the users to use their own ALG on statically or 468 dynamically reserved ports instead. 470 8.4. Port allocation 472 8.4.1. How many ports per customers? 474 Because IPv4 addresses will be shared among customers and potentially 475 a large address space reduction factor may be applied, in average, 476 only a limited number N of TCP or UDP port numbers will be available 477 per customer. This means that applications opening a very large 478 number of TCP ports may have a harder time to work. For example, it 479 has been reported that a very well know web site was using AJAX 480 techniques and was opening up to 69 TCP ports per web page. If we 481 make the hypothesis of an address space reduction of a factor 100 482 (one IPv4 address per 100 customers), and 65k ports per IPv4 483 addresses available, that makes an average of N = 650 ports available 484 simultaneously to be shared among the various devices behind the 485 dual-stack lite tunnel end-point. 487 There is an important operational difference if those N ports are 488 pre-allocated in a cookie-cutter fashion versus allocated on demand 489 by incoming connections. This is a difference between an average of 490 N ports and a maximum of N ports. Several service providers have 491 reported an average number of connections per customer in the single 492 digits. At the opposite end, thousands or tens of thousands of ports 493 could be use in a peak by any single customer browsing a number of 494 AJAX/Web 2.0 sites. 496 As such, service providers allocating a fixed number of ports per 497 user should dimension the system with a minimum of N = several 498 thousands of ports for every user. This would bring the address 499 space reduction ratio to a single digit. Service providers using a 500 smaller number of ports per user (N in the hundreds) should expect 501 customers applications to break in a more or less random way over 502 time. 504 In order to achieve higher address space reduction ratios, it is 505 recommended that service provider do not use this cookie-cutter 506 approach, and, on the contrary, allocate ports as dynamically as 507 possible, just like on a regular NAT. With an average number of 508 connections per customers in the single digit, having an address 509 space reduction of a factor 100 is realistic. However, service 510 providers should exercise caution and make sure their pool of port 511 numbers does not go too low. The actual maximum address space 512 reduction factor is unknown at this time. 514 8.4.2. Dynamic port assignment considerations 516 When dynamic port assignment is used to maximize the number of 517 subscribers sharing the AFTR global IPv4 addresses, the AFTR should 518 implement checks to avoid DOS attack through exhaustion of available 519 ports. It should also avoid mapping any one subscriber's "flows" 520 across more than one global IPv4 address. 522 8.4.3. Subscriber controlled port assignment 524 Dynamic port assignment precludes inbound access to subscriber 525 servers, just as in a CPE NAT. Inbound access to subscriber servers 526 can be provided through pre-assigned and/or reserved port mappings in 527 the AFTR. Specifying the mechanisms for managing and signaling these 528 reserved port mappings is out of scope for this document. 530 8.5. Other considerations about sharing global IPv4 addresses 532 More considerations on sharing the port space of IPv4 addresses can 533 be found in [I-D.ford-shared-addressing-issues]. 535 9. Acknowledgements 537 The authors would like to acknowledge the role of Mark Townsley for 538 his input on the overall architecture of this technology by pointing 539 this work in the direction of [I-D.droms-softwires-snat]. Note that 540 this document results from a merging of [I-D.durand-dual-stack-lite] 541 and [I-D.droms-softwires-snat].Also to be acknowledged are the many 542 discussions with a number of people including Shin Miyakawa, 543 Katsuyasu Toyama, Akihide Hiura, Takashi Uematsu, Tetsutaro Hara, 544 Yasunori Matsubayashi, Ichiro Mizukoshi. The author would also like 545 to thank David Ward, Jari Arkko, Thomas Narten and Geoff Huston for 546 their constructive feedback. Special thanks go to Dave Thaler and 547 Dan Wing for their reviews and comments. 549 10. IANA Considerations 551 This draft request IANA to allocate a well know IPv4 192.0.0.0/29 552 network prefix. That range is used to number the dual-stack lite 553 interfaces. Reserving a /29 allows for 6 possible interfaces on a 554 multi-home node. The IPv4 address 192.0.0.1 is reserved as the IPv4 555 address of the default router for such dual-stack lite hosts. 557 11. Security Considerations 559 Security issues associated with NAT have long been documented. See 560 [RFC2663] and [RFC2993]. 562 However, moving the NAT functionality from the CPE to the core of the 563 service provider network and sharing IPv4 addresses among customers 564 create additional requirements when logging data for abuse usage. 565 With any architecture where an IPv4 address does not uniquely 566 represent an end host, IPv4 addresses and a timestamps are no longer 567 sufficient to identify a particular broadband customer. Additional 568 information such as transport protocol information will be required 569 for that purpose. For example, we suggest to log the transport port 570 number for TCP and UDP connections. 572 The AFTR performs translation functions for interior IPv4 hosts at 573 RFC 1918 addresses or at the IANA reserved address range (TBA by 574 IANA). If the interior host is properly using the authorized IPv4 575 address with the authorized transport protocol port range such as A+P 576 semantic for the tunnel, the AFTR can simply forward without 577 translation to permit the authorized address and port range to 578 function properly. All packets with unauthorized interior IPv4 579 addresses or with authorized interior IPv4 address but unauthorized 580 port range MUST NOT be forwarded by the AFTR. This prevents rogue 581 devices from launching denial of service attacks using unauthorized 582 public IPv4 addresses in the IPv4 source header field or unauthorized 583 transport port range in the IPv4 transport header field. For 584 example, rogue devices could bombard a public web server by launching 585 TCP SYN ACK attack. The victim will receive TCP SYN from random IPv4 586 source addresses at a rapid rate and deny TCP services to legitimate 587 users. 589 With IPv4 addresses shared by multiple users, ports become a critical 590 resource. As such, some mechanisms need to be put in place by an 591 AFTR to limit port usage, either by rate-limiting new connections or 592 putting a hard limit on the maximum number of port usable by single 593 user. If this number is high enough, it should not interfere with 594 normal usage and still provide reasonable protection of the shared 595 pool. More considerations on ports allocation and port exhaustion 596 can be found in section 8.4. 598 More considerations on sharing IPv4 addresses can be found in 599 "I-D.ford-shared-addressing-issues". 601 AFTRs should support ways to limit service to registered customers. 602 If strict IPv6 ingress filtering is deployed in the broadband network 603 to prevent IPv6 address spoofing and dual-stack lite service is 604 restricted to those customers, then tunnels terminating at the AFTR 605 and coming from registered customer IPv6 addresses cannot be spoofed. 606 Thus a simple access control list on the tunnel transport source 607 address is all that is required to accept traffic on the southbound 608 interface of an AFTR. 610 If IPv6 address spoofing prevention is not in place, the AFTR should 611 perform further sanity checks on the IPv6 address of incoming IPv6 612 packets. For example, it should check if the address has really been 613 allocated to an authorized customer. 615 12. Appendix A: Deployment considerations 617 12.1. AFTR service distribution and horizontal scaling 619 One of the key benefits of the dual-stack lite technology lies in the 620 fact it is tunnel based. That is, tunnel end-points may be anywhere 621 in the service provider network. 623 Using the DHCPv6 tunnel end-point option, service providers can 624 create groups of users sharing the same AFTR. Those groups can be 625 merged or divided at will. This leads to an horizontally scaled 626 solution, where more capacity is added simply by adding more boxes. 627 As those groups of users can evolve over time, it is best to make 628 sure that AFTRs do not require per-user configuration in order to 629 provide service. 631 12.2. Horizontal scaling 633 A service provider can start using just a few AFTR centrally located. 634 Later, when more capacity is needed, more boxes can be added and 635 pushed to the edges of the access network. In case of a spike of 636 traffic, for example during the Olympic games or an important 637 political event, capacity can be quickly added in any location of the 638 network (tunnels can terminate anywhere) simply by splitting user 639 groups. Extra capacity can be later removed when the traffic returns 640 to normal by resetting the DHCPv6 tunnel end-point settings. 642 12.3. High availability 644 An important element in the design of the dual-stack lite technology 645 is the simplicity of implementation on the customer side. A simple 646 IP4-in-IPv6 tunnel and a default route over it is all is needed to 647 get IPv4 connectivity. Dealing with high availability is the 648 responsibility of the service provider, not the customer devices 649 implementing dual-stack lite. As such, a single IPv6 address of the 650 tunnel end-point is provided in the DHCPv6 option defined in 651 [I-D.ietf-softwire-ds-lite-tunnel-option]. The service provider can 652 use techniques such as anycast or various types of clusters to ensure 653 availability of the IPv4 service. The exact synchronization (or lack 654 thereof) between redundant AFTRs is out of scope for this document. 656 12.4. Logging 658 DS-Lite AFTR implementation should offer the possility to log NAT 659 binding creations or other ways to keep track of the ports/IP 660 addresses used by customers. This is both to support 661 troubleshooting, which is very important to service providers trying 662 to figure out why something may not be working, as well as to meet 663 region-specific requirements for responding to legally-binding 664 requests for information from law enforcement authorities. 666 13. Appendix B: Examples 667 13.1. Gateway based architecture 669 This architecture is targeted at residential broadband deployments 670 but can be adapted easily to other types of deployment where the 671 installed base of IPv4-only devices is important. 673 Consider a scenario where a Dual-Stack lite CPE is provisioned only 674 with IPv6 in the WAN port, no IPv4. The CPE acts as an IPv4 DCHP 675 server for the LAN network (wireline and wireless) handing out 676 RFC1918 addresses. In addition, the CPE may support IPv6 Auto- 677 Configuration and/or DHCPv6 server for the LAN network. When an 678 IPv4-only device connects to the CPE, that CPE will hand it out a 679 RFC1918 address. When a dual-stack capable device connects to the 680 CPE, that CPE will hand out a RFC1918 address and a global IPv6 681 address to the device. Besides, the CPE will create an IPv4-in-IPv6 682 softwire tunnel [RFC5571]to an AFTR that resides in the service 683 provider network. 685 When the device accesses IPv6 service, it will send the IPv6 datagram 686 to the CPE natively. The CPE will route the traffic upstream to the 687 default gateway. 689 When the device accesses IPv4 service, it will source the IPv4 690 datagram with the RFC1918 address and send the IPv4 datagram to the 691 CPE. The CPE will encapsulate the IPv4 datagram inside the IPv4-in- 692 IPv6 softwire tunnel and forward the IPv6 datagram to the AFTR. This 693 contrasts what the CPE normally does today, which is, NAT the RFC1918 694 address to the public IPv4 address and route the datagram upstream. 695 When the AFTR receives the IPv6 datagram, it will decapsulate the 696 IPv6 header and perform an IPv4-to-IPv4 NAT on the source address. 698 As illustrated in Figure 1, this dual-stack lite deployment model 699 consists of three components: the dual-stack lite home router with a 700 B4 element, the AFTR and a softwire between the B4 element acting as 701 softwire initiator (SI) [RFC5571] in the dual-stack lite home router 702 and the softwire concentrator (SC) [RFC5571] in the AFTR. The AFTR 703 performs IPv4-IPv4 NAT translations to multiplex multiple subscribers 704 through a pool of global IPv4 address. Overlapping address spaces 705 used by subscribers are disambiguated through the identification of 706 tunnel endpoints. 708 +-----------+ 709 | Host | 710 +-----+-----+ 711 |10.0.0.1 712 | 713 | 714 |10.0.0.2 715 +---------|---------+ 716 | | | 717 | Home router | 718 |+--------+--------+| 719 || B4 || 720 |+--------+--------+| 721 +--------|||--------+ 722 |||2001:db8:0:1::1 723 ||| 724 |||<-IPv4-in-IPv6 softwire 725 ||| 726 -------|||------- 727 / ||| \ 728 | ISP core network | 729 \ ||| / 730 -------|||------- 731 ||| 732 |||2001:db8:0:2::1 733 +--------|||--------+ 734 | AFTR | 735 |+--------+--------+| 736 || Concentrator || 737 |+--------+--------+| 738 | |NAT| | 739 | +-+-+ | 740 +---------|---------+ 741 |192.0.2.1 742 | 743 --------|-------- 744 / | \ 745 | Internet | 746 \ | / 747 --------|-------- 748 | 749 |198.51.100.1 750 +-----+-----+ 751 | IPv4 Host | 752 +-----------+ 754 Figure 1: gateway-based architecture 756 Notes: 758 o The dual-stack lite home router is not required to be on the same 759 link as the host 761 o The dual-stack lite home router could be replaced by a dual-stack 762 lite router in the service provider network 764 The resulting solution accepts an IPv4 datagram that is translated 765 into an IPv4-in-IPv6 softwire datagram for transmission across the 766 softwire. At the corresponding endpoint, the IPv4 datagram is 767 decapsulated, and the translated IPv4 address is inserted based on a 768 translation from the softwire. 770 13.1.1. Example message flow 772 In the example shown in Figure 2, the translation tables in the AFTR 773 is configured to forward between IP/TCP (10.0.0.1/10000) and IP/TCP 774 (192.0.2.1/5000). That is, a datagram received by the dual-stack 775 lite home router from the host at address 10.0.0.1, using TCP DST 776 port 10000 will be translated a datagram with IP SRC address 777 192.0.2.1 and TCP SRC port 5000 in the Internet. 779 +-----------+ 780 | Host | 781 +-----+-----+ 782 | |10.0.0.1 783 IPv4 datagram 1 | | 784 | | 785 v |10.0.0.2 786 +---------|---------+ 787 | | | 788 | home router | 789 |+--------+--------+| 790 || B4 || 791 |+--------+--------+| 792 +--------|||--------+ 793 | |||2001:db8:0:1::1 794 IPv6 datagram 2| ||| 795 | |||<-IPv4-in-IPv6 softwire 796 | ||| 797 -----|-|||------- 798 / | ||| \ 799 | ISP core network | 800 \ | ||| / 801 -----|-|||------- 802 | ||| 803 | |||2001:db8:0:2::1 804 +------|-|||--------+ 805 | | AFTR | 806 | v ||| | 807 |+--------+--------+| 808 || Concentrartor || 809 |+--------+--------+| 810 | |NAT| | 811 | +-+-+ | 812 +---------|---------+ 813 | |192.0.2.1 814 IPv4 datagram 3 | | 815 | | 816 -----|--|-------- 817 / | | \ 818 | Internet | 819 \ | | / 820 -----|--|-------- 821 | | 822 v |198.51.100.1 823 +-----+-----+ 824 | IPv4 Host | 825 +-----------+ 826 Figure 2: Outbound Datagram 828 +-----------------+--------------+-----------------+ 829 | Datagram | Header field | Contents | 830 +-----------------+--------------+-----------------+ 831 | IPv4 datagram 1 | IPv4 Dst | 198.51.100.1 | 832 | | IPv4 Src | 10.0.0.1 | 833 | | TCP Dst | 80 | 834 | | TCP Src | 10000 | 835 | --------------- | ------------ | ------------- | 836 | IPv6 Datagram 2 | IPv6 Dst | 2001:db8:0:2::1 | 837 | | IPv6 Src | 2001:db8:0:1::1 | 838 | | IPv4 Dst | 198.51.100.1 | 839 | | IPv4 Src | 10.0.0.1 | 840 | | TCP Dst | 80 | 841 | | TCP Src | 10000 | 842 | --------------- | ------------ | ------------- | 843 | IPv4 datagram 3 | IPv4 Dst | 198.51.100.1 | 844 | | IPv4 Src | 192.0.2.1 | 845 | | TCP Dst | 80 | 846 | | TCP Src | 5000 | 847 +-----------------+--------------+-----------------+ 849 Datagram header contents 851 When datagram 1 is received by the dual-stack lite home router, the 852 B4 function encapsulates the datagram in datagram 2 and forwards it 853 to the dual-stack lite carrier-grade NAT over the softwire. 855 When it receives datagram 2, the tunnel concentrator in the AFTR 856 hands the IPv4 datagram to the NAT, which determines from its 857 translation table that the datagram received on Softwire_1 with TCP 858 SRC port 10000 should be translated to datagram 3 with IP SRC address 859 192.0.2.1 and TCP SRC port 5000. 861 Figure 3 shows an inbound message received at the AFTR. When the NAT 862 function in the AFTR receives datagram 1, it looks up the IP/TCP DST 863 in its translation table. In the example in Figure 3, the NAT 864 translates the TCP DST port to 10000, sets the IP DST address to 865 10.0.0.1 and hands the datagram to the SC for transmission over 866 Softwire_1. The B4 in the home router decapsulates IPv4 datagram 867 from the inbound softwire datagram, and forwards it to the host. 869 +-----------+ 870 | Host | 871 +-----+-----+ 872 ^ |10.0.0.1 873 IPv4 datagram 3 | | 874 | | 875 | |10.0.0.2 876 +---------|---------+ 877 | +-+-+ | 878 | home router | 879 |+--------+--------+| 880 || B4 || 881 |+--------+--------+| 882 +--------|||--------+ 883 ^ |||2001:db8:0:1::1 884 IPv6 datagram 2 | ||| 885 | |||<-IPv4-in-IPv6 softwire 886 | ||| 887 -----|-|||------- 888 / | ||| \ 889 | ISP core network | 890 \ | ||| / 891 -----|-|||------- 892 | ||| 893 | |||2001:db8:0:2::1 894 +------|-|||--------+ 895 | AFTR | 896 |+--------+--------+| 897 || Concentrator || 898 |+--------+--------+| 899 | |NAT| | 900 | +-+-+ | 901 +---------|---------+ 902 ^ |192.0.2.1 903 IPv4 datagram 1 | | 904 | | 905 -----|--|-------- 906 / | | \ 907 | Internet | 908 \ | | / 909 -----|--|-------- 910 | | 911 | |198.51.100.1 912 +-----+-----+ 913 | IPv4 Host | 914 +-----------+ 916 Figure 3: Inbound Datagram 918 +-----------------+--------------+-----------------+ 919 | Datagram | Header field | Contents | 920 +-----------------+--------------+-----------------+ 921 | IPv4 datagram 1 | IPv4 Dst | 192.0.2.1 | 922 | | IPv4 Src | 198.51.100.1 | 923 | | TCP Dst | 5000 | 924 | | TCP Src | 80 | 925 | --------------- | ------------ | ------------- | 926 | IPv6 Datagram 2 | IPv6 Dst | 2001:db8:0:1::1 | 927 | | IPv6 Src | 2001:db8:0:2::1 | 928 | | IPv4 Dst | 10.0.0.1 | 929 | | IP Src | 198.51.100.1 | 930 | | TCP Dst | 10000 | 931 | | TCP Src | 80 | 932 | --------------- | ------------ | ------------- | 933 | IPv4 datagram 3 | IPv4 Dst | 10.0.0.1 | 934 | | IPv4 Src | 198.51.100.1 | 935 | | TCP Dst | 10000 | 936 | | TCP Src | 80 | 937 +-----------------+--------------+-----------------+ 939 Datagram header contents 941 13.1.2. Translation details 943 The AFTR has a NAT that translates between softwire/port pairs and 944 IPv4-address/port pairs. The same translation is applied to IPv4 945 datagrams received on the device's external interface and from the 946 softwire endpoint in the device. 948 In Figure 2, the translator network interface in the AFTR is on the 949 Internet, and the softwire interface connects to the dual-stack lite 950 home router. The AFTR translator is configured as follows: 952 Network interface: Translate IPv4 destination address and TCP 953 destination port to the softwire identifier and TCP destination 954 port 956 Softwire interface: Translate softwire identifier and TCP source 957 port to IPv4 source address and TCP source port 959 Here is how the translation in Figure 3 works: 961 o Datagram 1 is received on the AFTR translator network interface. 962 The translator looks up the IPv4-address/port pair in its 963 translator table, rewrites the IPv4 destination address to 964 10.0.0.1 and the TCP source port to 10000, and hands the datagram 965 to the SE to be forwarded over the softwire. 967 o The IPv4 datagram is received on the dual-stack lite home router 968 B4. The B4 function extracts the IPv4 datagram and the dual-stack 969 lite home router forwards datagram 3 to the host. 971 +------------------------------------+--------------------+ 972 | Softwire-Id/IPv4/Prot/Port | IPv4/Prot/Port | 973 +------------------------------------+--------------------+ 974 | 2001:db8:0:1::1/10.0.0.1/TCP/10000 | 192.0.2.1/TCP/5000 | 975 +------------------------------------+--------------------+ 977 Dual-Stack lite carrier-grade NAT translation table 979 The Softwire-Id is the IPv6 address assigned to the Dual-Stack lite 980 CPE. Hosts behind the same Dual-Stack lite home router have the same 981 Softwire-Id. The source IPv4 is the RFC1918 addressed assigned by 982 the Dual-Stack home router which is unique to each host behind the 983 CPE. The AFTR would receive packets sourced from different IPv4 984 addresses in the same softwire tunnel. The AFTR combines the 985 Softwire-Id and IPv4 address/Port [Softwire-Id, IPv4+Port] to 986 uniquely identify the host behind the same Dual-Stack lite home 987 router. 989 13.2. Host based architecture 991 This architecture is targeted at new, large scale deployments of 992 dual-stack capable devices implementing a dual-stack lite interface. 994 Consider a scenario where a Dual-Stack lite host device is directly 995 connected to the service provider network. The host device is dual- 996 stack capable but only provisioned an IPv6 global address. Besides, 997 the host device will pre-configure a well-known IPv4 non-routable 998 address (see IANA section). This well-known IPv4 non-routable 999 address is similar to the 127.0.0.1 loopback address. Every host 1000 device implemented Dual-Stack lite will pre-configure the same 1001 address. This address will be used to source the IPv4 datagram when 1002 the device accesses IPv4 services. Besides, the host device will 1003 create an IPv4-in-IPv6 softwire tunnel to an AFTR. The Carrier Grade 1004 NAT will reside in the service provider network. 1006 When the device accesses IPv6 service, the device will send the IPv6 1007 datagram natively to the default gateway. 1009 When the device accesses IPv4 service, it will source the IPv4 1010 datagram with the well-known non-routable IPv4 address. Then, the 1011 host device will encapsulate the IPv4 datagram inside the IPv4-in- 1012 IPv6 softwire tunnel and send the IPv6 datagram to the AFTR. When 1013 the AFTR receives the IPv6 datagram, it will decapsulate the IPv6 1014 header and perform IPv4-to-IPv4 NAT on the source address. 1016 This scenario works on both wireline and wireless networks. A 1017 typical wireless device will connect directly to the service provider 1018 without CPE in between. 1020 As illustrated in Figure 4, this dual-stack lite deployment model 1021 consists of three components: the dual-stack lite host, the AFTR and 1022 a softwire between the softwire initiator B4 in the host and the 1023 softwire concentrator in the AFTR. The dual-stack lite host is 1024 assumed to have IPv6 service and can exchange IPv6 traffic with the 1025 AFTR. 1027 The AFTR performs IPv4-IPv4 NAT translations to multiplex multiple 1028 subscribers through a pool of global IPv4 address. Overlapping IPv4 1029 address spaces used by the dual-stack lite hosts are disambiguated 1030 through the identification of tunnel endpoints. 1032 In this situation, the dual-stack lite host configures the IPv4 1033 address 192.0.0.2 out of the well-known range 192.0.0.0/29 (defined 1034 by IANA) on its B4 interface. It also configure the first non- 1035 reserved IPv4 address of the reserved range, 192.0.0.1 as the address 1036 of its default gateway. 1038 +-------------------+ 1039 | | 1040 | Host 192.0.0.2 | 1041 |+--------+--------+| 1042 || B4 || 1043 |+--------+--------+| 1044 +--------|||--------+ 1045 |||2001:db8:0:1::1 1046 ||| 1047 |||<-IPv4-in-IPv6 softwire 1048 ||| 1049 -------|||------- 1050 / ||| \ 1051 | ISP core network | 1052 \ ||| / 1053 -------|||------- 1054 ||| 1055 |||2001:db8:0:2::1 1056 +--------|||--------+ 1057 | AFTR | 1058 |+--------+--------+| 1059 || Concentrator || 1060 |+--------+--------+| 1061 | |NAT| | 1062 | +-+-+ | 1063 +---------|---------+ 1064 |192.0.2.1 1065 | 1066 --------|-------- 1067 / | \ 1068 | Internet | 1069 \ | / 1070 --------|-------- 1071 | 1072 |198.51.100.1 1073 +-----+-----+ 1074 | IPv4 Host | 1075 +-----------+ 1077 Figure 4: host-based architecture 1079 The resulting solution accepts an IPv4 datagram that is translated 1080 into an IPv4-in-IPv6 softwire datagram for transmission across the 1081 softwire. At the corresponding endpoint, the IPv4 datagram is 1082 decapsulated, and the translated IPv4 address is inserted based on a 1083 translation from the softwire. 1085 13.2.1. Example message flow 1087 In the example shown in Figure 5, the translation tables in the AFTR 1088 is configured to forward between IP/TCP (a.b.c.d/10000) and IP/TCP 1089 (192.0.2.1/5000). That is, a datagram received from the host at 1090 address 192.0.0.2, using TCP DST port 10000 will be translated a 1091 datagram with IP SRC address 192.0.2.1 and TCP SRC port 5000 in the 1092 Internet. 1094 +-------------------+ 1095 | | 1096 |Host 192.0.0.2 | 1097 |+--------+--------+| 1098 || B4 || 1099 |+--------+--------+| 1100 +--------|||--------+ 1101 | |||2001:db8:0:1::1 1102 IPv6 datagram 1| ||| 1103 | |||<-IPv4-in-IPv6 softwire 1104 | ||| 1105 -----|-|||------- 1106 / | ||| \ 1107 | ISP core network | 1108 \ | ||| / 1109 -----|-|||------- 1110 | ||| 1111 | |||2001:db8:0:2::1 1112 +------|-|||--------+ 1113 | | AFTR | 1114 | v ||| | 1115 |+--------+--------+| 1116 || Concentrator || 1117 |+--------+--------+| 1118 | |NAT| | 1119 | +-+-+ | 1120 +---------|---------+ 1121 | |192.0.2.1 1122 IPv4 datagram 2 | | 1123 -----|--|-------- 1124 / | | \ 1125 | Internet | 1126 \ | | / 1127 -----|--|-------- 1128 | | 1129 v |198.51.100.1 1130 +-----+-----+ 1131 | IPv4 Host | 1132 +-----------+ 1134 Figure 5: Outbound Datagram 1136 +-----------------+--------------+-----------------+ 1137 | Datagram | Header field | Contents | 1138 +-----------------+--------------+-----------------+ 1139 | IPv6 Datagram 1 | IPv6 Dst | 2001:db8:0:2::1 | 1140 | | IPv6 Src | 2001:db8:0:1::1 | 1141 | | IPv4 Dst | 198.51.100.1 | 1142 | | IPv4 Src | a.b.c.d | 1143 | | TCP Dst | 80 | 1144 | | TCP Src | 10000 | 1145 | --------------- | ------------ | ------------- | 1146 | IPv4 datagram 2 | IPv4 Dst | 198.51.100.1 | 1147 | | IPv4 Src | 192.0.2.1 | 1148 | | TCP Dst | 80 | 1149 | | TCP Src | 5000 | 1150 +-----------------+--------------+-----------------+ 1152 Datagram header contents 1154 When sending an IPv4 packet, the dual-stack lite host encapsulates it 1155 in datagram 1 and forwards it to the AFTR over the softwire. 1157 When it receives datagram 1, the concentrator in the AFTR hands the 1158 IPv4 datagram to the NAT, which determines from its translation table 1159 that the datagram received on Softwire_1 with TCP SRC port 10000 1160 should be translated to datagram 3 with IP SRC address 192.0.2.1 and 1161 TCP SRC port 5000. 1163 Figure 6 shows an inbound message received at the AFTR. When the NAT 1164 function in the AFTR receives datagram 1, it looks up the IP/TCP DST 1165 in its translation table. In the example in Figure 3, the NAT 1166 translates the TCP DST port to 10000, sets the IP DST address to 1167 a.b.c.d and hands the datagram to the concentrator for transmission 1168 over Softwire_1. The B4 in the dual-stack lite hosts decapsulates 1169 IPv4 datagram from the inbound softwire datagram, and forwards it to 1170 the host. 1172 +-------------------+ 1173 | | 1174 |Host 192.0.0.2 | 1175 |+--------+--------+| 1176 || B4 || 1177 |+--------+--------+| 1178 +--------|||--------+ 1179 ^ |||2001:db8:0:1::1 1180 IPv6 datagram 2 | ||| 1181 | |||<-IPv4-in-IPv6 softwire 1182 | ||| 1183 -----|-|||------- 1184 / | ||| \ 1185 | ISP core network | 1186 \ | ||| / 1187 -----|-|||------- 1188 | ||| 1189 | |||2001:db8:0:2::1 1190 +------|-|||--------+ 1191 | AFTR | 1192 | | ||| | 1193 |+--------+--------+| 1194 || Concentrator || 1195 |+--------+--------+| 1196 | |NAT| | 1197 | +-+-+ | 1198 +---------|---------+ 1199 ^ |192.0.2.1 1200 IPv4 datagram 1 | | 1201 -----|--|-------- 1202 / | | \ 1203 | Internet | 1204 \ | | / 1205 -----|--|-------- 1206 | | 1207 | |198.51.100.1 1208 +-----+-----+ 1209 | IPv4 Host | 1210 +-----------+ 1212 Figure 6: Inbound Datagram 1214 +-----------------+--------------+-----------------+ 1215 | Datagram | Header field | Contents | 1216 +-----------------+--------------+-----------------+ 1217 | IPv4 datagram 1 | IPv4 Dst | 192.0.2.1 | 1218 | | IPv4 Src | 198.51.100.1 | 1219 | | TCP Dst | 5000 | 1220 | | TCP Src | 80 | 1221 | --------------- | ------------ | ------------- | 1222 | IPv6 Datagram 2 | IPv6 Dst | 2001:db8:0:1::1 | 1223 | | IPv6 Src | 2001:db8:0:2::1 | 1224 | | IPv4 Dst | a.b.c.d | 1225 | | IP Src | 198.51.100.1 | 1226 | | TCP Dst | 10000 | 1227 | | TCP Src | 80 | 1228 +-----------------+--------------+-----------------+ 1230 Datagram header contents 1232 13.2.2. Translation details 1234 The translations happening in the AFTR are the same as in the 1235 previous examples. The well known IPv4 address 192.0.0.2 out of the 1236 192.0.0.0/29 (defined by IANA) range used by all the hosts are 1237 disambiguated by the IPv6 source address of the softwire. 1239 +-----------------------------------+--------------------+ 1240 | Softwire-Id/IPv4/Prot/Port | IPv4/Prot/Port | 1241 +-----------------------------------+--------------------+ 1242 | 2001:db8:0:1::1/a.b.c.d/TCP/10000 | 192.0.2.1/TCP/5000 | 1243 +-----------------------------------+--------------------+ 1245 Dual-Stack lite carrier-grade NAT translation table 1247 The Softwire-Id is the IPv6 address assigned to the Dual-Stack host. 1248 Each host has an unique Softwire-Id. The source IPv4 address is one 1249 of the well-known IPv4 address. The AFTR could receive packets from 1250 different hosts sourced from the same IPv4 well-known address from 1251 different softwire tunnels. Similar to the gateway architecture, the 1252 AFTR combines the Softwire-Id and IPv4 address/Port [Softwire-Id, 1253 IPv4+Port] to uniquely identify the individual host. 1255 14. Appendix C: Related DS-Lite work on port management 1257 Techniques discussed below are not part of the core dual-stack lite 1258 specification and may or may not be standardized in separate 1259 documents. They are only listed here for reference. 1261 Applications expecting incoming connections, such a peer-to-peer 1262 applications, have become popular. Those applications use a very 1263 limited number of ports, usually a single one. Making sure those 1264 applications keep working in a dual-stack lite environment is 1265 important. Similarly, there is a growing list of applications that 1266 require some kind of ALG to work through a NAT. Service provider 1267 AFTRs should not prevent the deployment of such applications. As 1268 such, there is a legitimate need to leave certain ports under the 1269 control of the end user or its applications. This argues for a 1270 hybrid environment, where most ports are dynamically managed by the 1271 AFTR in a shared pool and a limited number are dedicated per users 1272 and controlled by them. 1274 The details of how ports can be controlled by users and applications 1275 are beyond the scope of this document. For reference, the A+P 1276 [I-D.ymbk-aplusp] model where an address and a set of ports are 1277 assigned to users has been extensively discussed. User controled 1278 techniques for port allocation via a service provider portal or a 1279 DHCPv6 option [I-D.bajko-v6ops-port-restricted-ipaddr-assign] have 1280 been proposed. Techniques using some form of port control protocol 1281 such as UPnP [UPnP-IGD], NAT-PMP [I-D.cheshire-nat-pmp] and PCP 1282 [I-D.wing-softwire-port-control-protocol] are under discussion to 1283 enable a direct communication beetween applications and the service 1284 provider NAT. 1286 15. References 1288 15.1. Normative references 1290 [I-D.ietf-softwire-ds-lite-tunnel-option] 1291 Hankins, D. and T. Mrugalski, "Dynamic Host Configuration 1292 Protocol for IPv6 (DHCPv6) Options for Dual- Stack Lite", 1293 draft-ietf-softwire-ds-lite-tunnel-option-03 (work in 1294 progress), June 2010. 1296 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1297 Requirement Levels", BCP 14, RFC 2119, March 1997. 1299 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1300 IPv6 Specification", RFC 2473, December 1998. 1302 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1303 "Definition of the Differentiated Services Field (DS 1304 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1305 December 1998. 1307 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 1308 for IPv6 Hosts and Routers", RFC 4213, October 2005. 1310 [RFC5625] Bellis, R., "DNS Proxy Implementation Guidelines", 1311 BCP 152, RFC 5625, August 2009. 1313 15.2. Informative references 1315 [I-D.bajko-v6ops-port-restricted-ipaddr-assign] 1316 Bajko, G. and T. Savolainen, "Port Restricted IP Address 1317 Assignment", 1318 draft-bajko-v6ops-port-restricted-ipaddr-assign-02 (work 1319 in progress), November 2008. 1321 [I-D.cheshire-nat-pmp] 1322 Cheshire, S., "NAT Port Mapping Protocol (NAT-PMP)", 1323 draft-cheshire-nat-pmp-03 (work in progress), April 2008. 1325 [I-D.droms-softwires-snat] 1326 Droms, R. and B. Haberman, "Softwires Network Address 1327 Translation (SNAT)", draft-droms-softwires-snat-01 (work 1328 in progress), July 2008. 1330 [I-D.durand-dual-stack-lite] 1331 Durand, A., "Dual-stack lite broadband deployments post 1332 IPv4 exhaustion", draft-durand-dual-stack-lite-00 (work in 1333 progress), July 2008. 1335 [I-D.ford-shared-addressing-issues] 1336 Ford, M., Boucadair, M., Durand, A., Levis, P., and P. 1337 Roberts, "Issues with IP Address Sharing", 1338 draft-ford-shared-addressing-issues-02 (work in progress), 1339 March 2010. 1341 [I-D.nishitani-cgn] 1342 Yamagata, I., Miyakawa, S., Nakagawa, A., and H. Ashida, 1343 "Common requirements for IP address sharing schemes", 1344 draft-nishitani-cgn-05 (work in progress), July 2010. 1346 [I-D.templin-seal] 1347 Templin, F., "The Subnetwork Encapsulation and Adaptation 1348 Layer (SEAL)", draft-templin-seal-23 (work in progress), 1349 August 2008. 1351 [I-D.wing-softwire-port-control-protocol] 1352 Wing, D., Penno, R., and M. Boucadair, "Pinhole Control 1353 Protocol (PCP)", 1354 draft-wing-softwire-port-control-protocol-02 (work in 1355 progress), July 2010. 1357 [I-D.ymbk-aplusp] 1358 Bush, R., "The A+P Approach to the IPv4 Address Shortage", 1359 draft-ymbk-aplusp-05 (work in progress), October 2009. 1361 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1362 November 1990. 1364 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 1365 E. Lear, "Address Allocation for Private Internets", 1366 BCP 5, RFC 1918, February 1996. 1368 [RFC2663] Srisuresh, P. and M. Holdrege, "IP Network Address 1369 Translator (NAT) Terminology and Considerations", 1370 RFC 2663, August 1999. 1372 [RFC2993] Hain, T., "Architectural Implications of NAT", RFC 2993, 1373 November 2000. 1375 [RFC4787] Audet, F. and C. Jennings, "Network Address Translation 1376 (NAT) Behavioral Requirements for Unicast UDP", BCP 127, 1377 RFC 4787, January 2007. 1379 [RFC5382] Guha, S., Biswas, K., Ford, B., Sivakumar, S., and P. 1380 Srisuresh, "NAT Behavioral Requirements for TCP", BCP 142, 1381 RFC 5382, October 2008. 1383 [RFC5508] Srisuresh, P., Ford, B., Sivakumar, S., and S. Guha, "NAT 1384 Behavioral Requirements for ICMP", BCP 148, RFC 5508, 1385 April 2009. 1387 [RFC5571] Storer, B., Pignataro, C., Dos Santos, M., Stevant, B., 1388 Toutain, L., and J. Tremblay, "Softwire Hub and Spoke 1389 Deployment Framework with Layer Two Tunneling Protocol 1390 Version 2 (L2TPv2)", RFC 5571, June 2009. 1392 [UPnP-IGD] 1393 UPnP Forum, "Universal Plug and Play Internet Gateway 1394 Device Standardized Gateway Device Protocol", 1395 September 2006, 1396 . 1398 Authors' Addresses 1400 Alain Durand 1401 Juniper Networks 1402 1194 North Mathilda Avenue 1403 Sunnyvale, CA 94089-1206 1404 USA 1406 Email: adurand@juniper.net 1408 Ralph Droms 1409 Cisco 1410 1414 Massachusetts Avenue 1411 Boxborough, MA 01714 1412 USA 1414 Email: rdroms@cisco.com 1416 James Woodyatt 1417 Apple 1418 1 Infinite Loop 1419 Cupertino, CA 95014 1420 USA 1422 Email: jhw@apple.com 1424 Yiu Lee 1425 Comcast 1426 1, Comcast center 1427 Philadelphia, PA 19103 1428 USA 1430 Email: yiu_lee@cable.comcast.com