idnits 2.17.1 draft-ymbk-aplusp-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) -- The document has an IETF Trust Provisions (28 Dec 2009) Section 6.c(ii) Publication Limitation clause. If this document is intended for submission to the IESG for publication, this constitutes an error. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 19 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 6 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 27, 2009) is 5295 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-04) exists of draft-bajko-pripaddrassign-01 == Outdated reference: A later version (-01) exists of draft-boucadair-dhcpv6-shared-address-option-00 == Outdated reference: A later version (-09) exists of draft-boucadair-pppext-portrange-option-01 == Outdated reference: A later version (-11) exists of draft-ietf-softwire-dual-stack-lite-01 Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Bush, Ed. 3 Internet-Draft Internet Initiative Japan 4 Intended status: Standards Track October 27, 2009 5 Expires: April 30, 2010 7 The A+P Approach to the IPv4 Address Shortage 8 draft-ymbk-aplusp-05 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. This document may not be modified, 14 and derivative works of it may not be created, and it may not be 15 published except as an Internet-Draft. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on April 30, 2010. 35 Copyright Notice 37 Copyright (c) 2009 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents in effect on the date of 42 publication of this document (http://trustee.ietf.org/license-info). 43 Please review these documents carefully, as they describe your rights 44 and restrictions with respect to this document. 46 Abstract 48 We are facing the exhaustion of the IANA IPv4 free IP address pool. 50 Unfortunately, IPv6 is not yet deployed widely enough to fully 51 replace IPv4, and it is unrealistic to expect that this is going to 52 change before we run out of IPv4 addresses. Letting hosts seamlessly 53 communicate in an IPv4-world without assigning a unique globally 54 routable IPv4 address to each of them is a challenging problem. 56 This draft discusses the possibility of address sharing by treating 57 some of the port number bits as part of an extended IPv4 address 58 (Address plus Port, or A+P). Instead of assigning a single IPv4 59 address to a customer device, we propose to extended the address by 60 "stealing" bits from the port number in the TCP/UDP header, leaving 61 the applications a reduced range of ports. This means assigning the 62 same IPv4 address to multiple clients (e.g., CPE, mobile phones), 63 each with its assigned port-range. In the face of IPv4 address 64 exhaustion, the need for addresses is stronger than the need to be 65 able to address thousands of applications on a single host. If 66 address translation is needed, the end-user should be in control of 67 the translation process - not some smart boxes in the core. 69 Requirements Language 71 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 72 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 73 document are to be interpreted as described in RFC 2119 [RFC2119]. 75 Table of Contents 77 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 78 1.1. Why Carrier Grade NATs are Harmful . . . . . . . . . . . . 4 79 2. Design Constraints and Assumptions . . . . . . . . . . . . . . 6 80 2.1. Design constraints . . . . . . . . . . . . . . . . . . . . 6 81 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 82 3. Overview of the A+P Solution . . . . . . . . . . . . . . . . . 8 83 3.1. Signaling . . . . . . . . . . . . . . . . . . . . . . . . 10 84 3.2. Address realm . . . . . . . . . . . . . . . . . . . . . . 11 85 3.3. Reasons for allowing multiple A+P gateways . . . . . . . . 14 86 4. Deployment Scenarios . . . . . . . . . . . . . . . . . . . . . 16 87 4.1. A+P for Broadband Providers . . . . . . . . . . . . . . . 16 88 4.2. A+P for Mobile Providers . . . . . . . . . . . . . . . . . 17 89 4.3. A+P from the provider network perspective . . . . . . . . 17 90 4.4. Dynamic allocation of port ranges . . . . . . . . . . . . 20 91 4.5. Overall A+P architecture . . . . . . . . . . . . . . . . . 22 92 4.6. Example of A+P-forwarded packets . . . . . . . . . . . . . 22 93 4.7. Forwarding of standard packets . . . . . . . . . . . . . . 27 94 4.8. Handling ICMP . . . . . . . . . . . . . . . . . . . . . . 27 95 4.9. Limitations of the A+P approach . . . . . . . . . . . . . 28 96 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 97 6. Security Considerations . . . . . . . . . . . . . . . . . . . 28 98 7. Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 99 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 31 100 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 101 9.1. Normative References . . . . . . . . . . . . . . . . . . . 32 102 9.2. Informative References . . . . . . . . . . . . . . . . . . 32 103 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 33 105 1. Introduction 107 This document describes a technique to deal with the imminent IPv4 108 address space exhaustion. Many large Internet Service Providers 109 (ISPs) face the problem that their networks' customer edges are so 110 large that it will soon not be possible to provide each customer with 111 a unique IPv4 address. Therefore these ISPs have to devise something 112 more ingenious. Although undesirable, address sharing, a la NAT, is 113 inevitable. 115 To allow end-to-end connectivity between IPv4 speaking applications 116 we propose to "steal" some bits from the UDP/TCP header and use them 117 to extend addressing of devices. Assuming we could limit the 118 applications' port addressing to 8 (or 4) bits, we can increase the 119 effective size of an IPv4 address by 8 (or 12) additional bits. In 120 this scenario, 128 (or 4096) customers could be multiplexed on the 121 same IPv4 address, while allowing them a fixed range of 512 (or 16) 122 ports. Customers that require larger port-ranges could dynamically 123 request additional blocks, depending on their contract. We call this 124 "extended addressing" or "A+P" (Address plus Port) addressing. The 125 main advantage of A+P is that it preserves the Internet "end-to-end" 126 paradigm by not translating (at least some ports of) an IP address. 127 With NAT in the core of the network, this end-to-end connectivity is 128 broken. As long as the customer chooses to do this on his/her 129 premises this is a choice that he/she takes, however this is not an 130 option in face of the looming IPv4 address exhaustion, where so 131 called Carrier Grade NATs (CGNs) might be deployed within the 132 providers network - beyond control of the customer. CGNs come with 133 different names and in different flavors, such as NAT444, Large Scale 134 NATs (LSNs) or Address Family Transition Routers (AFTR). 136 1.1. Why Carrier Grade NATs are Harmful 138 Various forms of NATs will be installed at various levels and places 139 in the IPv4-Internet to achieve address compression. This document 140 argues for mechanisms where this happens as close to the edge as 141 possible, thereby minimizing damage to the End to End Principle. 142 End-customers will not be locked into a walled-garden without any 143 control over the translation. It is is essential to create 144 mechanisms to "bypass" NATs in the core, and keep the control at the 145 end-user: 147 "Carrier grade" is a euphemism for centralized. More semantics move 148 to the core of the network. This is bad in and of itself. Net-heads 149 call it "telco-think" because it is the telco model of smarts in the 150 core as opposed to the Internet model of a simple, just-forward- 151 packets core, with smart edges. It also places the provider in the 152 position, where the user is trapped behind unchangeable application 153 policies, and has the danger of invoking lawyers when users wish to 154 deploy new applications needing Application Level Gateways (ALGs). 155 This is the opposite of the "end-to-end" model of the Internet. 157 With the smarts at the edges, one can easily field new protocols 158 between consenting end-points by merely tweaking the NATs at the 159 corresponding Customer Premises Equipment (CPE), even adding 160 application layer gateways if they are needed. 162 Today's NATs are typically mitigated by ALGs over which the customer 163 has control, e.g. port forwarding or UPnP/NAT-PMP. However, this is 164 not expected to work with CGNs. CGN proposals - other than DS-Lite 165 [I-D.ietf-softwire-dual-stack-lite] with A+P - admit that it is not 166 expected that applications that require specific port assignment or 167 port mapping from the NAT box will keep working. This is the 168 ultimate horror the NAT-haters fear, and, in this case, they are not 169 all that wrong. 171 We believe this CGN approach is not an option and that the end-user 172 must have the ability to control their own ALGs. With CGN, if a user 173 wishes to deploy a new application, they must talk to the providers' 174 lawyers or run new disruptive technology over HTTP; we can pick our 175 poison. And if the NAT is not where the customer can directly 176 control it, i.e., it is anywhere in the provider's network, then the 177 provider controls what the user can control, i.e. it is not really 178 under user control. We do not wish to deal with the case where the 179 provider has to decide whether to allow Skype v42 when they 180 themselves provide a competing VoIP product. 182 Another issue with CGN is scalability. ISPs face a tension between 183 the placement of CGNs within their network to aggregate as much as 184 possible, when too much aggregation creates a massive state problem. 185 CGNs also present a single point of failure. And having a back-up 186 CGN has the state transfer problem as well as exposure to network 187 partition and dual-device failure. When you start talking about 188 'high reliability/availability, you have already lost the game. The 189 internet is about building a reliable network using unreliable 190 devices. 192 To reduce the state, NAT placement ends up as CGNs somewhere closer 193 to the edge. It is not clear how a CGN should maintain per-session 194 state in a scalable manner. State for improperly terminated sessions 195 could remain stale for some time. The CGN hence trades scalability 196 for the amount of state that needs to be kept, which makes optimally 197 placing a CGN a hard engineering problem. 199 Furthermore, with CGN, tracing hackers, spammers and other criminals 200 will be impossible, unless all the connection based mapping 201 information is recorded and stored. This would not only cause 202 concern for law enforcement services, but also for privacy advocates. 204 2. Design Constraints and Assumptions 206 The problem of address space shortage is first felt by providers with 207 a very large end-user customer base, such as broadband providers and 208 mobile-service providers. Though the cases and requirements are 209 slightly different, they share many commonalities. In the following 210 we will develop a set of overall design constraints. 212 2.1. Design constraints 214 We regard several constraints as important for our design: 216 1) End-to-End is under customer control: Customers shall have 217 the ability to deploy new application protocols at will. 218 IPv4 address shortage should not be a license to break the 219 Internet's end-to-end paradigm. 221 2) End-to-End transparency through multiple intermediate 222 devices: Multiple gateways should be able to operate in 223 sequence along one data path without interfering with each 224 other. 226 3) Backward compatibility: Approaches should be transparent to 227 unaware users. Devices or existing applications should be 228 able to work without modification. Emergence of new 229 applications should not be limited. 231 4) Incrementally deployable: The provider should not be forced 232 to replace unaffected core devices or replace customer 233 premises equipment (CPE). In particular, the provider should 234 be able to change only CPE where they wish to deploy A+P. And 235 customers should be able to acquire A+P aware CPE at will. 237 5) Highly-scalable and minimal state core: Minimal state should 238 be kept inside the ISP's network. If the operator is rolling 239 out A+P incrementally, it is understood there may be state in 240 the core in the non-A+P part of such a roll-out. 242 6) Efficiency vs. complexity: Operators should have the 243 flexibility to trade off port multiplexing efficiency and 244 scalability and end-to-end transparency. 246 7) Automatic configuration/administration: There should be no 247 need for customers to call the ISP and tell them that they 248 are operating their own A+P-gateway devices. Customers/ 249 mobile phone users should not be expected to look-up assigned 250 ports manually on websites and then configure them on devices 251 or applications. 253 8) "Double-NAT" should be avoided: Based on Constraint 2 254 multiple gateway devices might be present in a path, and once 255 one has done some translation, those packets should not be 256 re-translated. 258 9) Legal traceability: ISPs must be able to provide the identity 259 of a customer from the knowledge of the IPv4 public address 260 and the port. This should have as low an impact as is 261 reasonable on storage by the ISP. We assume that NATs on 262 customer premises do not pose much of a problem, while 263 provider NATs need to keep additional logs. 265 10) IPv6 deployment should be encouraged. NAT444 strongly biases 266 the users to the deployment of RFC 1918 addressing. A+P 267 should not. While we acknowledge that A+P might be used in 268 an IPv4-only environment (e.g., [I-D.boucadair-port-range]) 269 we strongly believe that IPv6 is the best long-term approach, 270 and that A+P should be considered only as an intermediate 271 hack towards an IPv6-only world. We therefore prefer to 272 assume in Constraint 10 that the ISP has migrated to a dual- 273 stack core and A+P can use IPv6 as a transport inside the 274 network. This ensures that A+P will not be a hindrance to 275 the introduction of IPv6. 277 Constraints 2 and 8 are important: while many techniques have been 278 deployed to allow applications to work through a NAT, traversing 279 cascaded NATs is crucial if NATs are being deployed in the core of a 280 provider network. 282 2.2. Terminology 284 The A+P architecture can be split into three distinct functions: 285 encaps/decaps, NAT, and signaling. 287 Encaps/decaps function: is used to forward port-restricted A+P- 288 packets over intermediate legacy devices. The encapsulation function 289 takes an IPv4 packet, looks up the IP and TCP/UDP headers, and puts 290 the packet into the appropriate tunnel. The state needed to perform 291 this action is comparable to a forwarding table. The decapsulation 292 device SHOULD check if the source address and port of packets coming 293 out of the tunnel are legitimate (e.g., see [BCP38]). Based on the 294 result of such a check, the packet MAY be forwarded untranslated, it 295 MAY be discarded or MAY be NATed. In this draft we refer to a device 296 that provides this encaps/decaps functionality as Port-Range-Router 297 (PRR). 299 Network Address Translation (NAT) function: is used to connect legacy 300 end-hosts. Unless upgraded, end-hosts or end-systems are not aware 301 of A+P restrictions and therefore assume a full IP address. The NAT 302 function performs any address or port translation, including 303 application-level-gateways (ALGs). The state that has to be kept to 304 implement this function is the mapping for which external addresses 305 and ports have been mapped to which internal addresses and ports, 306 just as in CPE NATs today. A subtle, but very important, difference 307 should be noted here: the customer has control over the NATing 308 process or might choose to "bypass" the NAT. If this is done, we 309 call the NAT a large scale NAT (LSN). However, if the NAT that does 310 NOT allow the customer to control the translation process, we refer 311 to as a CGN. 313 Signaling function: is used in order to allow A+P-aware devices get 314 to know which ports are assigned to be passed through untranslated 315 and what will happen to packets outside the assigned port-range 316 (e.g., could be NATed or discarded). Signaling may also be used to 317 learn the encapsulation method and any endpoint information needed. 318 In addition, the signaling function may be used to dynamically 319 increase/decrease the requested port-range. 321 A+P address realm: a public routable IPv4 address that is port 322 restricted (A+P). Forwarding of packets is done based on the IPv4 323 address and the TCP/UDP port numbers. When this draft talks about 324 "A+P packets" it is assumed that those packets pass untranslated. 326 Private address realm: IPv4 addresses that are not globally routed. 327 They may be taken from the [RFC1918] range. However, this draft does 328 not make such an assumption. We regard as private address space any 329 IPv4 address, which needs to be translated in order to gain global 330 connectivity, irrespective of whether it falls in [RFC1918] space or 331 not. 333 3. Overview of the A+P Solution 335 The core architectural elements of the A+P solution are three 336 separated and independent functions: the NAT function, the encaps/ 337 decaps function, and the signaling function. The NAT function is 338 similar to a NAT as we know it today: it performs a translation 339 between two different address realms. When the external realm is 340 public IPv4 address space, we assume that the translation is many-to- 341 one, in order to multiplex many customers on a single public IPv4 342 address. The only difference with a traditional NAT (Figure 1) is 343 that the translator might only be able to use a restricted range of 344 ports when mapping multiple internal addresses onto an external one, 345 e.g., the external address realm might be port-restricted. 347 "internal-side" "external-side" 348 +-----+ 349 internal | N | external 350 address <---| A |---> address 351 realm | T | realm 352 +-----+ 354 Traditional NAT 356 Figure 1 358 The encaps/decaps function, on the other hand, is the ability to 359 establish a tunnel with another end-point providing the same 360 function. This implies some form of signaling to establish a tunnel. 361 Such signaling can be viewed as integrated with DHCP or as a separate 362 service. Section 3.1 discusses the constraints of this signaling 363 function. The tunnel can be an IPv6 or IPv4encapsulation, a layer-2 364 tunnel, or some other form of softwire. Note that the presence of a 365 tunnel allows unmodified, naive, or even legacy devices between the 366 two endpoints. 368 Two or more devices which provide the encaps/decaps function and are 369 linked by tunnels to form an A+P subsystem. The function of each 370 gateway is to encapsulate and decapsulate respectively. Figure 2 371 depicts the simplest possible A+P subsystem, that is, two devices 372 providing the encaps/decaps function. 374 +------------------------------------+ 375 port-restricted | +----------+ tunnel +----------+ | external 376 address realm --|-| gateway |==========| gateway |-|-- address 377 | +----------+ +----------+ | realm 378 +------------------------------------+ 379 A+P subsystem 381 A simple A+P subsystem 383 Figure 2 385 Within an A+P subsystem, the external address realm is extended by 386 "stealing" bits from the port number. Each device is assigned one 387 address from the external realm and a range of port numbers. Hence, 388 devices which are part of an A+P subsystem can communicate with the 389 external address without the need for address translation (i.e., 390 preserving end-to-end packet integrity): an A+P packet originated 391 from within the A+P subsystem can be simply forwarded over tunnels up 392 to the endpoint, where it gets decapsulated and routed in the 393 external realm. 395 3.1. Signaling 397 The following information needs to be available on all the gateways 398 in the A+P subsystem. It is expected that there will be a signaling 399 protocol such as [I-D.bajko-pripaddrassign], 400 [I-D.boucadair-dhcpv6-shared-address-option], or 401 [I-D.boucadair-pppext-portrange-option]. The information that needs 402 to be shared is the following: 404 o a set of public IPv4 addresses, 406 o for each IPv4 address a starting point for the allocated port- 407 range, 409 o number of delegated ports, 411 o optional key that enables partial or full preservation of entropy 412 in port randomization - see [I-D.bajko-pripaddrassign], 414 o lifetime for each IPv4 address and allocated port-set, 416 o the tunneling technology to be used (e.g., "IPv6-encapsulation") 418 o addresses of the tunnel endpoints (e.g., IPv6 address of tunnel 419 endpoints) 421 o whether or not NAT function is provided by the gateway 423 o a device identification number and some authentication mechanisms 425 o a version number and some reserved bits for future use. 427 Note that the functions of encapsulation and decapsulation have been 428 separated from the NAT function. However, to accommodate legacy 429 hosts, NATing is likely to be provided at some point in the path; 430 therefore the availability or absence of NATing MUST be communicated 431 in signaling, as A+P is agnostic about NAT placement. 433 The port-ranges can be allocated in two different ways: 435 o If applications or end-hosts behind the CPE are not UPnPv2/NAT-PMP 436 aware, then the CPE SHOULD request ports via mechanisms, e.g. as 437 described in [I-D.bajko-pripaddrassign] and 438 [I-D.boucadair-pppext-portrange-option]. Note that different 439 port-ranges can have different lifetimes, and the CPE is not 440 entitled to use them after they expire - unless it refreshes those 441 ranges. It is up to the ISP to put mechanisms in place, that 442 determine what percentage of already allocated port-ranges should 443 be exhausted before a CPE may requests additional ranges, how 444 often the CPE can request additional ranges, and so on. (To 445 prevent Denial of Service attacks.) 447 o If applications behind the CPE are UPnPv2/NAT-PMP aware additional 448 ports MAY be requested through that mechanism. In this case the 449 CPE should forward those requests to the LSN and the LSN should 450 reply reporting if the requested ports are available or not (and 451 if they are not available some alternatives should be offered). 452 Here again, to prevent potential denial of service attacks, 453 mechanism should be in place to prevent UPnPv2/NAT-PMP packet 454 storms and fast port allocation. 456 Whatever signaling mechanism is used inside the tunnels, DHCP or IPCP 457 based, synchronization between signaling server and PRR must be 458 established in both directions. For example, if we use DHCP as 459 signaling mechanism, the PRR must communicate to DHCP server at least 460 its IP range. The DHCP server then starts to allocate IPs and port- 461 ranges to CPEs and communicates back to the PRR which IP and port 462 range have been allocated to which CPE, so the PRR knows to which 463 tunnel redirect incoming traffic. In addition, DHCP MUST also 464 communicate lifetimes of port-ranges assigned to CPE via the PRR. 466 If UPnPv2/NAT-PMP is used as dynamic port allocation mechanism, the 467 PRR must also communicate to the DHCP (or IPCP) server to avoid those 468 ports. The PRR must somehow (DHCP or IPCP options) communicate back 469 to CPE that allocation of ports was successful, so CPE adds those 470 ports to existing port-ranges. 472 3.2. Address realm 474 Each gateway within the A+P subsystem manages a certain portion of 475 A+P address space, that is, a portion of IPv4 space which is extended 476 by borrowing bits from the port number. This address space may be a 477 single, port-restricted IPv4 address. The gateway MAY use its 478 managed A+P address space for several purposes: 480 o Allocation of a sub-portion of the A+P address space to other 481 authenticated A+P gateways in the A+P subsystem (referred to as 482 delegation). We call the allocated sub-portion delegated address 483 space. 485 o Exchange of (untranslated) packets with the external address 486 realm. For this to work, such packets MUST use source address and 487 port belonging to the non-delegated address space. 489 If the gateway is also capable of performing the NAT function, it MAY 490 translate packets arriving on an internal interface which are outside 491 of its managed A+P address space into non-delegated address space. 493 Hence, a provider may have 'islands' of A+P as they slowly deploy 494 over time. The provider does not have to replace CPE until they want 495 to provide the A+P function to an island of users or even to one 496 particular user in a sea of non-A+P users. 498 An A+P gateway ("A"), accepts incoming connections from other A+P 499 gateways ("B"). Upon connection establishment (provided appropriate 500 authentication), B would "ask" A for delegation of an A+P address. 501 In turn, A will inform B about its public IPv4 address, and will 502 delegate a portion of its port-range to B. In addition, A will also 503 negotiate the encaps/decaps function with B (e.g., let B know the 504 address of the decaps device/other-end-point of the tunnel). 506 This could be implemented for example via a NAT-PMP or DHCP-like 507 solution. In general the following rule applies: A sub-portion of 508 the managed A+P address space is delegated as long as devices below 509 ask for it, otherwise private IPv4 is provided to support legacy 510 hosts. 512 private +-----+ +-----+ public 513 address ---| B |==========| A |--- Internet 514 realm +-----+ +-----+ 516 Address space realm of A: 517 public IPv4 address = 12.0.0.1 518 port range = 0-65535 520 Address space realm of B: 521 public IPv4 address = 12.0.0.1 522 port range = 2560-3071 524 Figure 3 526 Figure 3 illustrates a sample configuration. Note that A might 527 actually consist of three different devices: one that handles 528 signaling requests from B; one device that performs encapsulation and 529 decapsulation; and, if provided, one device that performs NATing 530 function (e.g., LSN). Packet forwarding is assumed to be as follows: 531 In the "out-bound" case, a packet arrives from the private address 532 realm to B. As stated above, B has two options: it can either apply 533 or not apply the NAT function. The decision depends upon the 534 specific configuration and/or the capabilities of A and B. Note that 535 NAT functionality is required to support legacy hosts, however, this 536 can be done at either of the two devices A or B. The term NAT refers 537 to translating the packet into the managed A+P address (B has address 538 12.0.0.1 and ports 2560-3071 in the example above). We then have two 539 options: 541 1) B NATs the packet. The translated packet is then tunneled to A. 542 A recognizes that the packet has already been translated, because 543 the source address and port match the delegated space. A 544 decapsulates the packet and releases it in the public Internet. 546 2) B does not NAT the packet. The untranslated packet is then 547 tunneled to A. A recognizes that the packet has not been 548 translated, so A forwards the packet to a co-located NATing 549 device, which translates the packet and routes it in the public 550 Internet. This device, e.g., an LSN, has to store the mapping 551 between the source port used to NAT and the tunnel where the 552 packet came from, in order to correctly route the reply. Note 553 that A cannot use a port number from the range that has been 554 delegated to B. As a consequence A has to assign a part of its 555 non-delegated address space to the NATing function. 557 "Inbound" packets are handled in the following way: a packet from the 558 public realm arrives at A. A analyzes the destination port number to 559 understand whether the packet needs to be NATed or not. 561 1) If the destination port number belongs to the range that A 562 delegated to B, then A tunnels the packet to B. B NATs the packet 563 using its stored mapping and forwards the translated packet to 564 the private domain. 566 2) If the destination port number is from the address space of the 567 LSN, then A passes the packet on to the co-located LSN which uses 568 its stored mapping to NAT the packet into the private address 569 realm of B. The appropriate tunnel is stored as well in the 570 mapping of the initial NAT. The LSN then encapsulates the packet 571 to B, which decapsulates it and normally routes it within its 572 private realm. 574 3) Finally, if the destination port number neither falls in a 575 delegated range, nor into the address range of the LSN, A 576 discards the packet. If the packet is passed to the LSN, but no 577 mapping can be found, the LSN discards the packet. 579 3.3. Reasons for allowing multiple A+P gateways 581 Since each device in an A+P subsystem provides the encaps/decaps 582 function, new devices can establish tunnels and become in turn part 583 of an A+P subsystem. As noted above, being part of an A+P subsystem 584 implies the capability of talking to the external address realm 585 without any translation. In particular, as described in the previous 586 section, a device X in an A+P subsystem can be reached from the 587 external domain by simply using the public IPv4 address and a port 588 which has been delegated to X. Figure 4 shows an example where three 589 devices are connected in a chain. In other words, A+P signaling can 590 be used to extend end-to-end connectivity to the devices which are in 591 an A+P subsystem. This allows A+P-aware applications (or OSes) 592 running on end hosts to enter an A+P subsystem and exploit 593 untranslated connectivity. 595 There are two modes for end-hosts to gain fine-grained control of 596 end-to-end connectivity. The first is where actual end-hosts perform 597 the NAT function and the encaps/decaps function which is required to 598 join the A+P subsystem. This option works in a similar way to the 599 NAT-in-the-host trick employed by virtualization software such as 600 VMware, where the guest operating system is connected via a NAT to 601 the host operating system. The second mode is applications which 602 autonomously ask for an A+P address and use it to join the A+P 603 subsystem. This capability is necessary for some applications that 604 require end-to-end connectivity (e.g., applications that need to be 605 contacted from outside). 607 +---------+ +---------+ +---------+ 608 internal | gateway | | gateway | | gateway | external 609 realm --| 1 |======| 2 |======| 3 |-- realm 610 +---------+ +---------+ +---------+ 612 An A+P subsystem with multiple devices 614 Figure 4 616 Whatever the reasons might be, the Internet was built on a paradigm 617 that end-to-end connectivity is important. A+P makes this still 618 possible in a time where address shortage forces ISPs to use NATs at 619 various levels. In such sense, A+P can be regarded as a way to 620 bypass NATs. 622 +---+ (customer2) 623 |A+P|-. +---+ 624 +---+ \ NAT|A+P|-. 625 \ +---+ | 626 \ | forward if in-range 627 +---+ \+---+ +---+ / 628 |A+P|------|A+P|----|A+P|---- 629 +---+ /+---+ +---+ \ 630 / NAT if necessary 631 / (cust1) (prov. (e.g., provider NAT) 632 +---+ / router) 633 |A+P|-' 634 +---+ 636 A complex A+P subsystem 638 Figure 5 640 Figure 5 depicts a complex scenario, where the A+P subsystem is 641 composed by multiple devices organized in a hierarchy. Each A+P 642 gateway decapsulates the packet and then re-encapsulates it again to 643 the next tunnel. 645 A packet can either be NATed when it enters the A+P subsystem, or at 646 intermediate devices, or when it exits the A+P subsystem. This could 647 be for example a gateway installed within the provider's network, 648 together with a LSN. Then each customer operates its own CPE. 649 However, behind the CPE applications might also be A+P-aware and run 650 their own A+P-gateways, which enables them to have end-to-end 651 connectivity. 653 One limitation applies, if "delayed translation" is used (e.g., 654 translation at the LSN instead of the CPE). If devices using 655 "delayed translation" want to talk to each other they SHOULD use A+P 656 addresses or out-of-band addressing. 658 4. Deployment Scenarios 660 4.1. A+P for Broadband Providers 662 Large broadband providers do not have enough IPv4 address space to 663 provide every customer with a single IP. The natural solution is 664 sharing a single IP address among many customers. Multiplexing 665 customers is usually accomplished by allocating different port 666 numbers to different customers somewhere within the network of the 667 provider. 669 In this document we use the following terms and assumptions: 671 1. Customer Premises Equipment (CPE), i.e. cable/DSL modem. 673 2. Provider Edge Router (PE), AKA customer aggregation router 675 3. Port Range Router (PRR), edge behind which A+P addresses are 676 used. 678 4. Provider Border Router (BR), providers edge to other providers 680 5. Network Core Routers (Core), provider routers which are not at 681 the edge. 683 It is expected that, when the provider wishes to enable A+P for a 684 customer or a range of customers, the CPE can be upgraded or replaced 685 to support A+P encaps/decaps functionality. Ideally the CPE also 686 provides NATing functionality. Further, it is expected that at least 687 another component in the ISP network provides the corresponding A+P 688 functionality, and hence is able to establish an A+P subsystem with 689 the CPE. This device is referred to as A+P router or port-range 690 router (PRR), and could be located close to PE routers. The core of 691 the network MUST support the tunneling protocol (which SHOULD be 692 IPv6, as per Constraint 10) but MAY be another tunneling technology 693 when necessary. In addition, we do not wish to restrict any 694 initiative of customers who might want to run an A+P-capable network 695 on or behind their CPE. To satisfy both Constraints 1 and 3 696 unmodified legacy hosts should keep working seamlessly, while 697 upgraded/new end-systems should be given the opportunity to exploit 698 enhanced features. 700 4.2. A+P for Mobile Providers 702 In the case of mobile service provider the situation is slightly 703 different. The A+P border is assumed to be the gateway (e.g., GGSN/ 704 PDN GW of 3GPP, or ASN GW of WiMAX). The need to extend the address 705 is not within the provider network, but on the edge between the 706 mobile phone devices and the gateway. While desirable, IPv6 707 connectivity may or may not be provided. 709 For mobile providers we use the following terms and assumptions: 711 1. Provider Network (PN) 713 2. Gateway (GW) 715 3. Mobile Phone device (phone) 717 4. Devices behind phone, e.g., laptop computer connecting via phone 718 to Internet. 720 We expect that the gateway has a pool of IPv4 addresses and is always 721 in the data-path of the packets. Transport between the gateway and 722 phone devices is assumed to be an end-to-end layer-2 tunnel. We 723 assume that phone as well as gateway can be upgraded to support A+P. 724 However, some applications running on the phone or devices behind the 725 phone (such as laptop computers connecting via the phone), are not 726 expected to be upgraded. Again, while we do not expect that devices 727 behind the phone will be A+P aware/upgraded we also do not want to 728 hinder their evolution. In this sense the mobile phone would be 729 comparable to the CPE in the broadband provider case; the gateway to 730 the PRR/LSN box in the network of the broadband provider. 732 4.3. A+P from the provider network perspective 734 ISPs suffering from IPv4 address space exhaustion are interested in 735 achieving a high address space compression ratio. In this respect, 736 an A+P subsystem allows much more flexibility than traditional NATs: 737 the NAT can be placed at the customer, and/or in the provider 738 network. In addition hosts or applications can request ports and 739 thus have untranslated end-to-end connectivity. 741 +---------------------------+ 742 private | +------+ A+P-in +-----+ | dual-stacked 743 (RFC1918) --|-| CPE |==-IPv6-==| PRR |-|-- network 744 space | +------+ tunnel +-----+ | (public addresses) 745 | ^ +-----+ | 746 | | IPv6-only | LSN | | 747 | | network +-----+ | 748 +----+----------------- ^ --+ 749 | | 750 on customer within provider 751 premises and control network 753 A simple A+P subsystem example 755 Figure 6 757 Consider the deployment scenario in Figure 6, where an A+P subsystem 758 is formed by the CPE and a port-range router (PRR) within the ISP 759 core network, preferably close to the customer edge, and represents 760 the border from where on packets are forwarded based on address and 761 port. The provider MAY deploy a LSN co-located with the PRR to 762 handle packets that have not been translated by the CPE. In such a 763 configuration, the ISP allows the customer to freely decide whether 764 the translation is done at the CPE or at the LSN. In order to 765 establish the A+P subsystem, the CPE will be configured automatically 766 (e.g. via a signaling protocol, that conforms to the requirements 767 stated above). 769 Note that the CPE in the example above is only provisioned with an 770 IPv6 address on the external interface. 772 +------------ IPv6-only transport ------------+ 773 | +---------------+ | | | 774 | |A+P-application| | +--------+ | +-----+ | dual-stacked 775 | | on end-host |=|==| CPE w/ |==|==| PRR |-|-- network 776 | +---------------+ | +--------+ | +-----+ | (public addresses) 777 +---------------+ | +--------+ | +-----+ | 778 private IPv4 <-*--+->| NAT | | | LSN | | 779 address space \ | +--------+ | +-----+ | 780 for legacy +|--------------|----------+ 781 hosts | | 782 | | 783 end-host with | CPE device | provider 784 upgraded | on customer | network 785 application | premises | 787 An extended A+P subsystem with end-host running A+P-aware 788 applications 790 Figure 7 792 Figure 7 shows an example of how an upgraded application running on a 793 legacy end-host can connect. The legacy host is provisioned with a 794 private IPv4 address allocated by the CPE. Any packet sent from the 795 legacy host will be NATed either at the CPE (if configured to do so), 796 or at the LSN (if available). 798 An A+P-aware application running on the end-host MAY use the 799 signaling described in Section 3.1 to connect to the A+P-subsystem. 800 In this case, the application will be delegated some space in the A+P 801 address realm, and will be able to contact the external realm (i.e., 802 the public Internet) without the need for translation. 804 Note that part of A+P signaling is that the NATs are optional. 805 However, if neither the CPE nor the PRR provides NATing 806 functionality, then it will not be possible to connect legacy end- 807 hosts. 809 To enable packet forwarding with A+P, the ISP MUST install at its A+P 810 border a PRR which encaps/decaps packets. However, to achieve a 811 higher address space compression ratio and/or to support CPEs without 812 NATing functionality, the ISP MAY decide to provide an LSN as well. 813 If no LSN is installed in some part of the ISP's topology, all CPE in 814 that part of the topology MUST support NAT functionality. For 815 reasons of scalability, it is assumed that the PRR is located within 816 the access-portion of the network. The CPE would be configured 817 automatically (e.g. via an extended DHCP or NAT-PMP, which has the 818 signaling requirements stated above) with the address of the PRR, and 819 if a LSN is being provided or not. Figure 6 illustrates a possible 820 deployment scenario. 822 4.4. Dynamic allocation of port ranges 824 Allocating a fixed number of ports to all CPE may lead to exhaustion 825 of ports for high usage customers. This is a perfect recipe for 826 upsetting more demanding customers. On the other hand, allocating to 827 all customers ports sufficient to match the needs of peak users will 828 not be very efficient. A mechanism for dynamic allocation of port 829 ranges allows the ISP to achieve two goals; a more efficient 830 compression ratio of number of customers on one IPv4 address and, on 831 the other hand, not limiting the more demanding customers' 832 communication. 834 Additional allocation of ports, or port ranges may be made after an 835 initial static allocation of ports. 837 The following mechanism applies to NAT functionality in CPE only: If 838 a customer has an arrangement with the ISP for well-known ports, and 839 the PRR allocates to this CPE WKP range, this range may be used for 840 end-to-end communications to a server behind CPE with public IP 841 address or if customer configures so for inbound NAT (1:1 or port 842 forwarding). This function has a fixed range of ports and is not 843 considered in the dynamic pool allocation mechanism. On the other 844 hand, if customer configures the NAT function to access the Internet 845 from a private address pool behind the CPE, this mechanism is 846 automatically applied. NAT keeps track of translation tables, so 847 only a small "daemon" needs to be developed and implemented by the 848 CPE manufacturer to keep track of allocated ranges of ports and how 849 many are used. In the case of 90% usage, the dynamic allocation 850 daemon could signal to the PRR the need for additional ports. A 851 downside of this mechanism is that port allocation to a CPE might get 852 quite large without an additional mechanism that would return unused 853 port ranges back to the PRR's pool. This may be dealt with by 854 requiring the NAT to sequentially allocate ports for translation and 855 reallocate to new requests and released ports. So the use of ports 856 is controlled and unfragmented ranges may be returned to pool. An 857 other, not so pretty, way is to reset the additional allocations to 0 858 every 24 hours, and leave only the first allocation. Additional 859 allocations would be requested by mechanism in a very short time, 860 leaving the customer unlikely to notice the event. 862 The mechanism would prefer allocations of port ranges from the same 863 IP address as the initial allocation. If it is not possible to 864 allocate an additional port range from the same IP, then mechanism 865 can allocate a port range from another IP within the same subnet. 866 With every additional port range allocation, the PRR updates its 867 routing table. The mechanism for allocating additional port ranges 868 may be part of normal signaling that is used to authenticate CPE to 869 ISP. 871 The ISP controls the dynamic allocation of port ranges by the PRR by 872 setting the initial allocation size and maximum number of allocations 873 per CPE, or the maximum allocations per subscription, depending on 874 subscription level. There is a general observation that the more 875 demanding customer uses around 1024 ports when heavily communicating. 876 So, for example, a first suggestion might be 128 ports initially and 877 then dynamic allocations of ranges of 128 ports up to 511 more 878 allocations maximum. A configured maximum number of allocations 879 could be used to prevent one customer acting in distructive manner 880 should they become infected. The maximum number of allocations might 881 also be more finely grained, with parameters of how many allocations 882 a user may request per some time frame. If this is used, evasive 883 applications may need to be limited in their bad behavior, for 884 example one additional allocation per minute would considerably slow 885 a port request storm. 887 There is likely no minimum request size. This is because A+P-aware 888 applications running on end-hosts MAY request a single port (or a few 889 ports) for the CPE to be contacted on (e.g., VoIP clients register a 890 public IP and a single delegated port from the CPE, and accept 891 incoming calls on that port). The implementation on the CPE or PRR 892 will dictate how to handle such requests for smaller blocks: For 893 example, half of available blocks might be used for "block- 894 allocations", 1/6 for single port requests, and the rest for NATing. 896 Another possible mechanism to allocate additional ports is UPnP/ 897 NAT-PMP (as defined in Section 3.1), if applications behind CPE 898 support it. In case of the LSN implementation (DS-Lite), as 899 described in the A+P overall architecture section, signaling packets 900 are simply forwarded by the CPE to the LSN and back to the host 901 running the application which requested the ports, and PRR allocates 902 requested port to appropriate CPE. The same behavior may be chosen 903 with AFTR, if requested ports are outside of static initial port 904 allocation. If a full A+P implementation is selected, than UPnPv2/ 905 NAT-PMP packets are accepted by the CPE, processed, and the requested 906 port number is communicated through normal signaling mechanism 907 between CPE and PRR tunnel endpoints (DHCP or IPCP). 909 4.5. Overall A+P architecture 911 A+P architecture 913 IPv4 Full-A+P AFTR CGN 914 | | | | 915 <-- Full IPv4 ---- Port range ---- Port range ---- Provider ---> 916 allocated & dynamic & LSN NAT ONLY 917 allocation (NAT on CPE (No mechanism) 918 (no NAT) (NAT on CPE) and on LSN) for customer to 919 bypass CGN) 921 Figure 8: A+P overall architecture 923 The A+P architecture defines various options to be deployed within an 924 ISP. Figure 8 shows the spectrum of deployment options. On the far 925 left today's status-quo, an IPv4 address unrestricted with full port- 926 range. Full-A+P, refers to a port-range allocation from the ISP. 927 The customer must operate A+P-aware devices and no NATing 928 functionality is provided by the ISP. AFTR, such as DS-Lite 929 [I-D.ietf-softwire-dual-stack-lite], is a hybrid. There is NAT 930 present in the core (in this draft referred to as LSN), but the user 931 has the option to "bypass" that NAT in one form or an other, for 932 example via A+P, NAT-PMP, etc... Finally, a provider only CGN, will 933 place a NAT in the providers core and does not allow the customer to 934 "bypass" the translation process or modify ALGs on the NAT. The 935 customer is provider-locked. Note as well that all options (besides 936 full IPv4) require some form of tunneling mechanism (e.g., 4in6) and 937 a signaling mechanism (see Section 3.1). 939 4.6. Example of A+P-forwarded packets 941 This section provides a detailed example of A+P setup, configuration, 942 and packet flow from an end-host behind an A+P upgraded provider to 943 any host in the IPv4 Internet, and how the return packets flow back. 944 The following example discusses an A+P-unaware end-host, where the 945 NATing is done at the CPE. Figure 9 illustrates how the CPE receives 946 an IPv4 packet from the end-user device. We first describe the case 947 where the CPE has been configured to provide the NAT functionality 948 (e.g., by the customer via interaction via a website, or via 949 automatic signaling). In the following, we call a packet which is 950 translated at the CPE an A+P-forwarded packet, an analogy with the 951 port-forwarding function employed in today's CPEs. Upon receiving a 952 packet from the internal interface, the CPE NATs it and forwards it 953 to the PRR. The NAT on the CPE is assumed to store the 5-tuple 954 (source_IPv4, source_port, destination_IPv4, destination_port, 955 tunnel-interface). 957 When the PRR receives the A+P-forwarded packet, it de-capsulates the 958 inner IPv4 packet and it checks the source address and port. If the 959 source address and port match the CPE's A+P address, then the PRR 960 simply forwards the decapsulated packet onward. This is always the 961 case for A+P-forwarded packets. Otherwise, the PRR assumes that the 962 packet is not A+P-forwarded, sl passes it to the LSN function, which 963 in-turn NATs the packet and then releases it into the Internet. 964 Figure 9 shows the packet flow for an outgoing A+P-forwarded packet. 966 +-----------+ 967 | Host | 968 +-----+-----+ 969 | | 10.0.0.2 970 IPv4 datagram 1 | | 971 | | 972 v | 10.0.0.1 973 +---------|---------+ 974 |CPE | | 975 +--------|||--------+ 976 | ||| a::2 977 | ||| 12.0.0.3 (100-200) 978 IPv6 datagram 2| ||| 979 | |||<-IPv4-in-IPv6 980 | ||| 981 -----|-|||------- 982 / | ||| \ 983 | ISP access network | 984 \ | ||| / 985 -----|-|||------- 986 | ||| 987 v ||| a::1 988 +--------|||--------+ 989 |PRR ||| | 990 +---------|---------+ 991 | | 12.0.0.1 992 IPv4 datagram 3 | | 993 -----|--|-------- 994 / | | \ 995 | ISP network / | 996 \ Internet / 997 -----|--|-------- 998 | | 999 v | 128.0.0.1 1000 +-----+-----+ 1001 | IPv4 Host | 1002 +-----------+ 1004 Figure 9: Forwarding of Outgoing A+P-forwarded Packets 1006 +-----------------+--------------+-----------------------------+ 1007 | Datagram | Header field | Contents | 1008 +-----------------+--------------+-----------------------------+ 1009 | IPv4 datagram 1 | IPv4 Dst | 128.0.0.1 | 1010 | | IPv4 Src | 10.0.0.2 | 1011 | | TCP Dst | 80 | 1012 | | TCP Src | 8000 | 1013 | --------------- | ------------ | --------------------------- | 1014 | IPv6 Datagram 2 | IPv6 Dst | a::1 | 1015 | | IPv6 Src | a::2 | 1016 | | IPv4 Dst | 128.0.0.1 | 1017 | | IPv4 Src | 12.0.0.3 | 1018 | | TCP Dst | 80 | 1019 | | TCP Src | 100 | 1020 | --------------- | ------------ | --------------------------- | 1021 | IPv4 datagram 3 | IPv4 Dst | 128.0.0.1 | 1022 | | IPv4 Src | 12.0.0.3 | 1023 | | TCP Dst | 80 | 1024 | | TCP Src | 100 | 1025 +-----------------+--------------+-----------------------------+ 1027 Datagram header contents 1029 An incoming packet undergoes the reverse process. When the PRR 1030 receives an IPv4 packet on an external interface, it first checks 1031 whether the destination port number falls in a delegated range or 1032 not. If the address space was delegated, then PRR encapsulates the 1033 incoming packet and forwards it through the appropriate tunnel for 1034 that IP/port range. If the address space was not-delegated the 1035 packet would be handed to the LSN to check if a mapping is available. 1037 Figure 10 shows how an incoming packet is forwarded, under the 1038 assumption that the port number matches the port range which was 1039 delegated to the CPE. 1041 +-----------+ 1042 | Host | 1043 +-----+-----+ 1044 ^ | 10.0.0.2 1045 IPv4 datagram 3 | | 1046 | | 1047 | | 10.0.0.1 1048 +---------|---------+ 1049 |CPE | | 1050 +--------|||--------+ 1051 ^ ||| a::2 1052 | ||| 12.0.0.3 (100-200) 1053 IPv6 datagram 2| ||| 1054 | |||<-IPv4-in-IPv6 1055 | ||| 1056 -----|-|||------- 1057 / | ||| \ 1058 | ISP access network | 1059 \ | ||| / 1060 -----|-|||------- 1061 | ||| 1062 | ||| a::1 1063 +--------|||--------+ 1064 |PRR ||| | 1065 +---------|---------+ 1066 ^ | 12.0.0.1 1067 IPv4 datagram 1 | | 1068 -----|--|-------- 1069 / | | \ 1070 | ISP network / | 1071 \ Internet / 1072 -----|--|-------- 1073 | | 1074 | | 128.0.0.1 1075 +-----+-----+ 1076 | IPv4 Host | 1077 +-----------+ 1079 Figure 10: Forwarding of Incoming A+P-forwarded Packets 1081 +-----------------+--------------+-----------------------------+ 1082 | Datagram | Header field | Contents | 1083 +-----------------+--------------+-----------------------------+ 1084 | IPv4 datagram 1 | IPv4 Dst | 12.0.0.3 | 1085 | | IPv4 Src | 128.0.0.1 | 1086 | | TCP Dst | 100 | 1087 | | TCP Src | 80 | 1088 | --------------- | ------------ | --------------------------- | 1089 | IPv6 Datagram 2 | IPv6 Dst | a::2 | 1090 | | IPv6 Src | a::1 | 1091 | | IPv4 Dst | 12.0.0.3 | 1092 | | IP Src | 128.0.0.1 | 1093 | | TCP Dst | 100 | 1094 | | TCP Src | 80 | 1095 | --------------- | ------------ | --------------------------- | 1096 | IPv4 datagram 3 | IPv4 Dst | 10.0.0.2 | 1097 | | IPv4 Src | 128.0.0.1 | 1098 | | TCP Dst | 8000 | 1099 | | TCP Src | 80 | 1100 +-----------------+--------------+-----------------------------+ 1102 Datagram header contents 1104 Note that datagram 1 travels untranslated up to the CPE, thus the 1105 customer has the same control over the translation as it has today 1106 where s/he has an home gateway with customizable port-forwarding. 1108 4.7. Forwarding of standard packets 1110 Packets for which the CPE does not have a corresponding port 1111 forwarding rule are tunneled to the PRR which provides the LSN 1112 function. We underline that the LSN MUST NOT use the delegated space 1113 for NATting. See [I-D.ietf-softwire-dual-stack-lite] for network 1114 diagrams which illustrate the packet flow in this case. 1116 4.8. Handling ICMP 1118 ICMP is problematic for all NATs, because it lacks port numbers. A+P 1119 routing exacerbates the problem. 1121 Most ICMP messages fall into one of two categories: error reports, or 1122 ECHO/ECHO reply (commonly known as "ping"). For error reports, the 1123 offending packet header is embedded within the ICMP packet; NAT 1124 devices can then rewrite that portion and route the packet to the 1125 actual destination host. This functionality will remain the same 1126 with A+P; however, the PRR will need to examine the embedded header 1127 to extract the port number, while the A+P gateway will do the 1128 necessary rewriting. 1130 ECHO and ECHO reply are more problematic. For ECHO, the A+P gateway 1131 device must rewrite the "Identifier" and perhaps "Sequence Number" 1132 fields in the ICMP request, treating them as if they were port 1133 numbers. This way, the PRR can build the correct A+P address for the 1134 returning ECHO replies, so they can be correctly routed back to the 1135 appropriate host in the same way as TCP/UDP packets. (Pings 1136 originated from an external domain/legacy Internet towards an A+P 1137 device are not supported.) 1139 4.9. Limitations of the A+P approach 1141 One limitation that A+P shares with any other IP address-sharing 1142 mechanism is the availability of well-known ports. In fact, services 1143 run by customers that share the same IP address will be distinguished 1144 by the port number. As a consequence, it will be impossible for two 1145 customers who share the same IP address to run services on the same 1146 port (e.g., port 80). Unfortunately, working around this limitation 1147 usually implies application-specific hacks (e.g., HTTP and HTTPS 1148 redirection), discussion of which is out of the scope of this 1149 document. Of course, a provider might charge more for giving a 1150 customer the well-known port range, 0..1024, thus allowing the 1151 customer to provide externally available services. Many applications 1152 require the availability of well known ports. However, those 1153 applications are not expected to work in A+P environment unless they 1154 can adapt to work with different ports. However, such application do 1155 not work behind today's NATs either. 1157 Another problem which is common to all NATs is coexistence with 1158 IPsec. In fact, a NAT which also translates port numbers prevents AH 1159 and ESP from functioning properly, both in tunnel and in transport 1160 mode. In this respect, we stress that, since an A+P subsystem 1161 exhibits the same external behavior as a NAT, well-known workarounds 1162 (such as [RFC3715]) can be employed. 1164 5. IANA Considerations 1166 This document makes no request of IANA. 1168 Note to RFC Editor: this section may be removed on publication as an 1169 RFC. 1171 6. Security Considerations 1173 The primary security issue any time a NAT is mentioned is the 1174 implicit firewall provided by a NAT. Any proposal to eliminate NATs 1175 raises the spectre of insecure hosts lying naked before a hostile 1176 Internet. For a number of reasons, we do not think this is a serious 1177 issue here. If nothing else, NATs are not really security devices; 1178 their protective value is limited. 1180 A NAT owned by a customer, whether a home consumer or a large 1181 enterprise, is under the control of that customer. All machines on 1182 the customer's side of the NAT have unfettered access to other 1183 machines on the same side; generally, this is what is desired. A+P 1184 NATs do not change this, as the customer has still controls what is 1185 being NATed. LSN does not change the access property, either. 1186 However, with a CGN without A+P there are *many* machines on the 1187 inside of the translation, not all of which are in the customer's 1188 administrative domain. Unless other firewall mechanisms are 1189 employed, LSNs create added risk of unauthorized access. 1191 By contrast, the protection scope of an A+P NAT is, by definition, at 1192 the boundary to the customer network. The access properties are thus 1193 precisely what traditional NATs have provided. 1195 There is one notable exception to this point. Inbound packets 1196 addressed to the assigned port number range are passed through 1197 unchanged, even if no outbound packets were sent to the originator. 1198 While this allows customers to run their own servers on certain 1199 ports, it also allows attackers to probe these servers without the 1200 protection provided today by provider-supplied NAT boxes. The issue 1201 is not that internal machines are addressable -- that is an 1202 inevitable corollary to servers being run -- but that it may 1203 represent a change from today's behavior. Furthermore, the effect on 1204 the customer varies greatly, depending on what port number range they 1205 are assigned; someone who is assigned 0-4K derives more benefit and 1206 runs more risk than someone who is assigned 48K-52K, since the latter 1207 is in the IANA-assigned dynamic port range. 1209 A useful middle ground would be provision of a customer-controllable 1210 switch in the CPE to control what happens to such packets. If 1211 filtering is to be done, state must be kept, which might be costly. 1212 This suggests that perhaps it should only be done in the CPE if it is 1213 replacing current CPE that provides NAT functionality. If 1214 applications on end-hosts installed A+P gateways, they might open up 1215 ports untranslated. 1217 Note that, regardless of the existence of such an option, the A+P 1218 gateway will need customer-controllable port number-mapping 1219 capability, as most customers will not be assigned a range which 1220 corresponds to the servers they wish to run. 1222 With CGN/LSNs, tracing hackers, spammers and other criminals will be 1223 extremely difficult, requiring logging, recording, and storing of all 1224 connection based mapping information. The need for storage implies a 1225 tradeoff. On one hand, the LSNs can manage addresses and ports as 1226 dynamically as possible, in order to maximize aggregation. On the 1227 other hand, the more quickly the mapping between private and public 1228 space changes, the more information needs to be recorded. This would 1229 not only cause concern for law enforcement services, but also for 1230 privacy advocates. 1232 A+P offers a better set of tradeoffs. All that needs to be logged is 1233 the allocation of a range of port numbers to a customer. By design, 1234 this will be done rarely, improving scalability. If the NAT 1235 functionality is moved further up the tree, the logging requirement 1236 will be as well, increasing the load on one node, but giving it more 1237 resources to allocate to a busy customer, perhaps decreasing the 1238 frequency of allocation requests. 1240 The other extreme is A+P NAT on the customer premises. Such a node 1241 would be no different than today's NAT boxes, which do no such 1242 logging. We thus conclude that A+P is no worse than today's 1243 situation, while being considerably better than CGNs. 1245 7. Authors 1247 This document has 8 primary authors, which is not allowed in the 1248 header of Internet-Drafts. This is the list of actual authors of 1249 this document. 1251 Gabor Bajko 1252 Nokia 1253 Email: gabor(dot)bajko(at)nokia(dot)com 1255 Steven M. Bellovin 1256 Columbia University 1257 1214 Amsterdam Avenue 1258 MC 0401 1259 New York, NY 10027 1260 US 1261 Phone: +1 212 939 7149 1262 Email: bellovin@acm.org 1264 Randy Bush 1265 Internet Initiative Japan 1266 5147 Crystal Springs 1267 Bainbridge Island, Washington 98110 1268 US 1269 Phone: +1 206 780 0431 x1 1270 Email: randy@psg.com 1271 Luca Cittadini 1272 Universita' Roma Tre 1273 via della Vasca Navale, 79 1274 Rome, 00146 1275 Italy 1276 Phone: +39 06 5733 3215 1277 Email: luca.cittadini@gmail.com 1279 Alain Durand 1280 Comcast 1281 1 Comcast Center 1282 Philadelphia, PA 1283 US 1284 alain_durand@cable.comcast.com 1286 Olaf Maennel 1287 Loughborough University 1288 Department of Computer Science - N.2.03 1289 Loughborough 1290 United Kindom 1291 Phone: +44 115 714 0042 1292 Email: o@maennel.net 1294 Teemu Savolainen 1295 Nokia 1296 Hermiankatu 12 D 1297 TAMPERE, FI-33720 1298 Finland 1299 Email: teemu.savolainen@nokia.com 1301 Jan Zorz 1302 go6.si 1303 Frankovo naselje 165 1304 Skofja Loka 4220 1305 Slovenia 1306 Phone: +38659042000 1307 Email: jan@go6.si 1309 8. Acknowledgments 1311 The authors wish to especially thank Remi Despres, and Pierre Levis 1312 for their help on the development of the A+P approach. We also thank 1313 David Ward for review, constructive criticism, and interminable 1314 questions, and Dave Thaler for useful criticism on "stackable" A+P 1315 gateways. We would also like to thank the following persons for 1316 their feedback on earlier versions of this work: Rob Austein, Gert 1317 Doering, Dino Farinacci, Russ Housley, and Ruediger Volk. 1319 9. References 1321 9.1. Normative References 1323 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1324 Requirement Levels", BCP 14, RFC 2119, March 1997. 1326 9.2. Informative References 1328 [BCP38] Ferguson, P. and D. Senie, "Network Ingress Filtering: 1329 Defeating Denial of Service Attacks which employ IP Source 1330 Address Spoofing", BCP 38, May 2000. 1332 [I-D.bajko-pripaddrassign] 1333 Bajko, G., Savolainen, T., Boucadair, M., and P. Levis, 1334 "Port Restricted IP Address Assignment", 1335 draft-bajko-pripaddrassign-01 (work in progress), 1336 March 2009. 1338 [I-D.boucadair-dhcpv6-shared-address-option] 1339 Boucadair, M., Levis, P., Grimault, J., Savolainen, T., 1340 and G. Bajko, "Dynamic Host Configuration Protocol 1341 (DHCPv6) Options for Shared IP Addresses Solutions", 1342 draft-boucadair-dhcpv6-shared-address-option-00 (work in 1343 progress), May 2009. 1345 [I-D.boucadair-port-range] 1346 Boucadair, M., Levis, P., Bajko, G., and T. Savolainen, 1347 "IPv4 Connectivity Access in the Context of IPv4 Address 1348 Exhaustion: Port Range based IP Architecture", 1349 draft-boucadair-port-range-02 (work in progress), 1350 July 2009. 1352 [I-D.boucadair-pppext-portrange-option] 1353 Boucadair, M., Levis, P., Grimault, J., and A. 1354 Villefranque, "Port Range Configuration Options for PPP 1355 IPCP", draft-boucadair-pppext-portrange-option-01 (work in 1356 progress), July 2009. 1358 [I-D.ietf-softwire-dual-stack-lite] 1359 Durand, A., Droms, R., Haberman, B., Woodyatt, J., Lee, 1360 Y., and R. Bush, "Dual-stack lite broadband deployments 1361 post IPv4 exhaustion", 1362 draft-ietf-softwire-dual-stack-lite-01 (work in progress), 1363 July 2009. 1365 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 1366 E. Lear, "Address Allocation for Private Internets", 1367 BCP 5, RFC 1918, February 1996. 1369 [RFC3715] Aboba, B. and W. Dixon, "IPsec-Network Address Translation 1370 (NAT) Compatibility Requirements", RFC 3715, March 2004. 1372 Author's Address 1374 Randy Bush (editor) 1375 Internet Initiative Japan 1376 5147 Crystal Springs 1377 Bainbridge Island, Washington 98110 1378 US 1380 Phone: +1 206 780 0431 x1 1381 Email: randy@psg.com