idnits 2.17.1 draft-shyam-real-ip-framework-61.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 28, 2020) is 1543 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: 'RFC6177' on line 835 -- Looks like a reference, but probably isn't: 'RFC4692' on line 840 == Unused Reference: '12' is defined on line 999, but no explicit reference was found in the text == Unused Reference: '13' is defined on line 1004, but no explicit reference was found in the text == Unused Reference: '14' is defined on line 1007, but no explicit reference was found in the text == Unused Reference: '15' is defined on line 1010, but no explicit reference was found in the text == Unused Reference: '16' is defined on line 1013, but no explicit reference was found in the text == Unused Reference: '17' is defined on line 1016, but no explicit reference was found in the text ** Obsolete normative reference: RFC 4893 (ref. '4') (Obsoleted by RFC 6793) -- Obsolete informational reference (is this intentional?): RFC 1771 (ref. '14') (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 1883 (ref. '15') (Obsoleted by RFC 2460) -- Obsolete informational reference (is this intentional?): RFC 2460 (ref. '16') (Obsoleted by RFC 8200) Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET DRAFT S. Bandyopadhyay 3 draft-shyam-real-ip-framework-61.txt January 28, 2020 4 Intended status: Experimental 5 Expires: July 28, 2020 7 An Architectural Framework of the Internet for the Real IP World 8 draft-shyam-real-ip-framework-61.txt 10 Abstract 12 This document tries to propose an architectural framework of the 13 internet in the real IP world. It describes how a three-tier mesh 14 structured hierarchy can be established in a large address space 15 based on fragmenting it into some regions and some sub regions inside 16 each of them. It shows how to make a transition from private IP to 17 real IP without making significant changes with the existing network. 18 With the useful works done through IPv6, it provides all necessary 19 inputs based on which a specification of IP with 64 bit address space 20 may be emerged. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on July 28, 2020. 39 Copyright Notice 41 Copyright (c) 2020 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. 51 Table of Contents 52 1. Introduction.....................................................2 53 2. Background.......................................................3 54 3. A Three tier mesh structured hierarchical network................4 55 3.1. Route propagation...........................................5 56 3.2. Determination of prefix lengths.............................8 57 3.2.1. A pseudo optimal distribution of prefixes in 58 a 64 bit architecture................................9 59 3.2.2. Whether to go for a two tier or three tier hierarchy 60 ....................................................10 61 3.3. Issues related to Satellite communications.................11 62 4. Provider Independent addressing, name services and multihoming..12 63 5. Issues related to IP mobility...................................13 64 5.1. Changes expected with the specifications related 65 to IP mobility.............................................15 66 6. Refinements over existing IPv6 specification....................16 67 7. Distributed processing and Multicasting.........................19 68 8. Transition to real IP from private IP...........................19 69 9. IANA Consideration..............................................20 70 10. Security Consideration.........................................20 71 11. Acknowledgments................................................20 72 12. Normative References...........................................21 73 13. Informative References.........................................21 74 14. Author's Address...............................................22 76 1. Introduction 78 Transition from IPv4 to IPv6 is in the process. Work has been done to 79 upgrade individual nodes (workstations) from IPv4 to IPv6. Also, 80 there are established documents to make routers/switches to work to 81 support IPv4 as well as IPv6 packets simultaneously in order to make 82 the transition possible [1]. CIDR[2] based hierarchical architecture 83 in the existing 32-bit system is supposed to be continued in IPv6 too 84 with a large address space. There are documents/concerns over BGP 85 table entries to become too large in the existing system [3]. There 86 are proposals to upgrade Autonomous System number to 32-bit from 87 16-bit to support the demand at the same time [4]. The challenge 88 relies on how to make the transition smooth from IPv4 to a real IP 89 world with least changes possible. 91 The term "real IP environment" is referred to an environment where 92 hosts in a customer network will possess globally unique IP addresses 93 and communicate with the rest of the world without the help of 94 NAT[5]. This document reflects changes required with the BSD 4.4 95 source code where ever applicable. 97 2. Background 99 Existing system is in work with Autonomous System (AS) and inter-AS 100 layer with the approach of CIDR. In order to meet the need within the 101 32-bit address space, Autonomous Systems of various sizes maintain 102 CIDR based hierarchical architecture. With the help of NAT [5], a 103 stub network can maintain an user ID space as large as a class A 104 network and can meet its useful need to communicate with the rest of 105 the world with very few real IP addresses. With the combination of 106 CIDR and NAT applied in the entire space, most of the part of 32-bit 107 address space gets effectively used as network ID. 109 With traditional CIDR based hierarchy, a node of higher prefix can be 110 divided into number of nodes with lower prefixes. Each divided node 111 can further be subdivided with nodes of further lower prefixes. This 112 process can be continued till no further division is possible. The 113 point worth noting is at each point the designer of the network has 114 to preconceive the future expansion of the network with the concept 115 in the mind that the resource can not be exhausted at any point of 116 time. This phenomenon leads the designer to allocate resources much 117 higher than whatever is needed which leads to a space of unused 118 address space. The problem gets aggravated once resource gets 119 exhausted by any chance. e.g. a node of prefix /16 can be divided 120 with a number of nodes of prefixes /24. If any one of the nodes /24 121 gets exhausted, resources of other nodes of prefixes /24 can not be 122 used even if they are available. 124 In IPv4 environment, there is a desperate attempt of the service 125 providers to provide internet services with the help of NAT. e.g. a 126 large educational institute meets its current requirement with 4 real 127 IP addresses; one for its mail server, one for its web server, one 128 for its ftp server and another one for its proxy server to provide 129 web based services to all of its users. In general, these services 130 are used by an organization of any size(it may be 400 or even 40000). 131 In the current scenario, the CIDR based tree has been built using 132 these components together. When private IP will be replaced with real 133 IP, each customer network will require IP addresses based on its size 134 and requirement. 136 Transitioning from private IP to real IP basically requires the 137 following components: 139 o A solution for site multihoming with provider assigned 140 address space 141 o A strategy to replace private IP to real IP 142 o A solution to uniquely identify a host in a real IP environment 143 o A solution to make individual nodes and routers/switches to work 144 with IPv4 and next generation IP simultaneously. 146 Solution for site multihoming has been provided in a separate 147 document [10]. Section 9 shows how to make a transition from private 148 IP space to real IP space with provider assigned addresses with CIDR 149 based approach itself without reorganization of the existing provider 150 network. Section 5 provides a solution for identifying a host 151 uniquely with a number in a real IP environment. RFC 4213 [1] has 152 already described the transition mechanism from IPv4 to IPv6 for 153 individual nodes and routers. 155 Transitioning to real IP will eliminate the extra routing entries 156 associated with multihomed sites and thus will reduce the size of the 157 BGP table substantially. Assignment of addresses requires an 158 architectural framework. It may continue with the existing CIDR based 159 architecture (provided transitioning to real IP will be good enough 160 to handle all routing related issues for ever) or may come out with a 161 different approach. Mesh structured hierarchy will reduce the growth 162 of routing entries in a CIDR based environment as well as convenient 163 for distribution of network resources in a suitable manner in the 164 long run. 166 This document also tries to resolve and enhance several issues that 167 were carried on as part of deployment of IPv6. It shows that a 64 bit 168 address space is good enough for all practical purposes. With the 169 useful works done through IPv6, it provides all necessary inputs 170 based on which a specification of IP with 64 bit address space may be 171 emerged. 173 3. A Three-tier mesh structured hierarchical network 175 As Autonomous Systems of various sizes are supported, Autonomous 176 Systems and the nodes inside the Autonomous Systems can be viewed as 177 graphically lying on the same plane within the address apace. If 178 network can be viewed as lying on different planes, routing issues 179 can be made simpler. If network is designed with a fixed length of 180 prefix for the Autonomous System everywhere, routing information for 181 the rest will get confined with the other part of the network prefix. 182 Which means the maximum size of AS gets assigned to all irrespective 183 of their actual sizes. This can be made possible with the advantage 184 of using a large address space and dividing it into number of regions 185 of fixed sizes inside it. Thus entire network can be viewed as a 186 network of inter-AS layer nodes. Each node in the inter-AS layer can 187 act either only as a router in the inter-AS layer or as a router in 188 the inter-AS layer with an Autonomous System attached to it with a 189 single point of attachment or as an Autonomous System with multiple 190 Autonomous System border routers (ASBR) appearing like a mesh. Thus 191 two tier mesh structured hierarchy gets established between AS layer 192 and inter-AS layer with each AS having a fixed length of prefix. 194 Based on the definition of Autonomous System, it is a small area 195 within the entire network that maintains its own independent identity 196 that communicates with the rest of the world through some specific 197 border routers. In the similar manner, if a larger area (say region 198 or state) can be considered as network of Autonomous Systems, that 199 can maintain its own identity by communicating with the rest of the 200 world through some border routers (say, state border router), mesh 201 structured hierarchy can be established within the inter-AS layer. 202 The inter-AS layer will be split into inter-AS-top and inter-AS- 203 bottom. To maintain this hierarchy, each node of inter-AS-top needs 204 to have multiple regional or state border routers (say, SBR) through 205 which each one will communicate with the rest of the world in the 206 similar manner an Autonomous System maintains ASBR. Thus, entire 207 network will appear as a network of nodes of inter-AS-top layer. To 208 maintain hierarchy, each node of the inter-AS-top needs to have a 209 fixed length of prefix. i.e. each node of the inter-AS top will be 210 assigned a maximum (fixed) number of nodes of Autonomous Systems. 212 Thus, with three-tier mesh structured hierarchy in the network layer, 213 network ID can be viewed as A.B.C. If pA, pB and pC be the prefix 214 lengths of inter-AS-top, inter-AS-bottom and AS layers respectively, 215 there will be 2^pA nodes at the topmost layer, 2^pB at the inter-AS- 216 bottom layer and 2^pC nodes at the AS layer. Thus the entire space 217 gets divided into a fixed number of regions and each region gets 218 divided into fixed number of sub regions. This division is supposed 219 to be made based on geography, population density and their demands 220 and related factors. 222 Let nMaxInterASTopNodes be the possible maximum number of nodes 223 assigned at the top most layer and nMaxInterASBottomNodes be that at 224 the inter-AS-bottom layer and nMaxASNodes at the AS layer. Where 225 nMaxInterASTopNodes <= 2^pA and nMaxInterASBottomNodes <= 2^pB and 226 nMaxASNodes <= 2^pC. 228 3.1. Route propagation 230 With hierarchy established, routing information that gets established 231 inside a node of inter-AS-top, does not need to be propagated to 232 another node of inter-AS-top. Entire routing information of inter-AS- 233 top layer needs to be propagated to inter-AS-bottom layer. So, each 234 router of inter-AS layer will have two tables of information, one for 235 the inter-AS-top and another for the inter-AS-bottom of the inter-AS- 236 top node that it belongs to. BGP (with little modification) will work 237 very well with a trick applied at the SBRs. Each SBR will not 238 propagate the routing information of inter-AS-bottom layer of its 239 domain to another SBR of neighboring domain. i.e. SBR of one top 240 layer node will propagate routing information only of inter-AS-top 241 layer to SBR of another top layer node. Inside a node of inter-AS- 242 top, routing information of inter-AS-top and inter-AS-bottom need to 243 be propagated from one ASBR to another neighboring ASBR. Inside a top 244 layer node A, routing information of another top layer node B will 245 have two parts; one for the list of SBRs through which a packet will 246 traverse from top layer node A to B and another for the list of ASBRs 247 through which the packet will traverse from one AS to another inside 248 A. In terms of BGP, AS_PATH attribute will be split into two parts; 249 one for the information of the top layer and another for the bottom 250 layer. Within the same node A routing information of one AS to 251 another AS will not have any top layer information. i.e. the top 252 layer information will be set to as NULL. 254 Similarly, each node of the AS layer will have three tables of 255 routing entries. One for the inter-AS-top, one for the inter-AS- 256 bottom and another for the routing information inside the Autonomous 257 System itself. 259 Introduction of hierarchy at the inter-AS layer reduces the size of 260 the routing table substantially. With the availability of hardware 261 resources if flat address space is maintained at each layer, problems 262 related to CIDR can be avoided. With flat address space, no 263 hierarchical relationship needs to be established between any two 264 nodes in the same layer. So, all the nodes inside each layer can be 265 used till they get exhausted. With flat address space (i.e. without 266 prefix reduction), BGP tables will have maximum nMaxInterASTopNodes + 267 nMaxInterASBottomNodes entries. 269 IGP like OSPF has got provision to divide AS into smaller areas. OSPF 270 hides the topology of an area from the rest of the Autonomous System. 271 This information hiding enables a significant reduction in routing 272 traffic. With the support of subnetting, OSPF attaches an IP address 273 mask to indicate a range of IP addresses being described by that 274 particular route. With this approach it reduces the size of the 275 routing traffic instead of describing all the nodes inside it, but 276 introduces another level of hierarchy. If subnetting concept can be 277 avoided from the AS layer(with the additional overhead of computation 278 inside the SPF tree), each area can be configured from a free pool of 279 addresses based on its requirement dynamically. So, an AS can be 280 divided into number of areas of heterogeneous sizes with the nodes 281 from a free pool of address space. 283 Similarly, the concept of area can be introduced in the inter-AS- 284 bottom layer the way it works in OSPF. The area border routers in the 285 inter-AS-bottom layer have to behave exactly in the similar manner 286 the way an ABR behaves in OSPF. i.e. an area border router will hide 287 the topology inside an area to the rest of the world and will 288 distribute the collected information inside the area to the rest. It 289 will distribute the collected routing information from outside to the 290 nodes inside as well. In order to implement this, protocol running in 291 the inter-AS layer (say BGP) will have to introduce a 'cost' factor. 292 This cost factor can be interpreted as the cost of propagation of a 293 packet from one AS to another. The protocols running inside AS layer 294 (RIP/OSPF, etc) will have to the supply the cost information for a 295 packet to travel from one ASBR to another. All the protocols must 296 behave in unison for supplying this information. The cost factor is 297 needed for a remote node while sending a packet to a node inside an 298 area while more than one area border routers are equidistant from 299 that remote node. Thus inter-AS-bottom layer (i.e. one inter-AS-top 300 level node) can be divided into number of areas of heterogeneous 301 sizes with nodes of AS from a free pool of address space. BGP adopts 302 a technique called route aggregation. Along with route aggregation it 303 reduces routing information within a message. In the similar manner, 304 introduction of area inside inter-AS-bottom layer will not only 305 reduce the complexity of the protocol, but will reduce the size of a 306 BGP packet substantially. 308 With this architecture, each node(router) inside an AS is represented 309 as A.B.C. Each node may or may not be attached with a network which 310 acts as a leaf node (i.e. a network will not act as a transit). In 311 order to make use of user-id space properly and to support customer 312 networks of heterogeneous sizes, the user-ID space needs to be 313 divided as subnet-ID and user-ID. Profoundly, a VLSM (variable length 314 subnet mask) type of approach (in the form of a tree) has to be 315 adopted at each node of an AS. So, each node of the AS layer will act 316 as the root of a tree whose leaves are independent small customer 317 networks which will act as stub. As the routing information of inter- 318 AS layer as well as AS layer need not be passed inside any node of 319 the VLSM tree, each router inside the tree should maintain default 320 route for any address outside of its network/domain. With this 321 approach, load on each router of the service providers will become 322 negligible. Protocols that supports VLSM with MPLS/VPN has to be 323 implemented inside the tree[11]. Inside the VLSM tree, all the 324 physical ports of a switch have to be configured with the subnet 325 mask. A light weight routing protocol can be developed on top of 326 static routing table by setting default route inside VLSM tree. 328 The fundamental assumptions based on which this architecture lies can 329 be summarized as follows: 331 i) Entire network can be viewed as a network of regions or states 332 where each region or state can have its own identity by communicating 333 with the rest of the world through some state border routers. Each 334 region or state is a network of Autonomous Systems. Each region as 335 well as each Autonomous System inside them will have a fixed 336 (maximum) length of prefix. 338 ii) Availability of hardware resources is such that flat address 339 space can be maintained at the inter-AS layer. 341 Introduction of mesh-structured hierarchy will have several 342 advantages: 344 o Load at each router will get reduced substantially. 345 o Concept of CIDR style approach and complexity related to 346 prefix reduction can be easily avoided. 347 o Mesh structured hierarchy will make traffic evenly distributed. 348 o Physical cable connection can be optimized. 349 o Administrative issues will become easier. 351 3.2. Determination of prefix lengths 353 With this architecture, IP address can be described as A.B.C.D where 354 the D part represents the user id. Each router in the inter-AS layer 355 will have two tables of information, one for the inter-AS-top and 356 another for the inter-AS-bottom of the inter-AS-top node that it 357 belongs to. Whereas, each node of the AS layer will have three tables 358 of routing entries; one for the inter-AS-top, one for the inter-AS- 359 bottom and another for the routing information inside the Autonomous 360 System itself. In the worst case. a node inside an AS needs to 361 maintain nMaxInterASTopNodes + nMaxInterASBottomNodes + nMaxASNodes 362 entries in its routing table. 364 The dynamic nature of allocating an area from a free pool of address 365 space is more frequent at the AS layer than at the inter-AS-bottom 366 layer. As OSPF supports all the features needed, it can be considered 367 as default choice in the AS layer. Existing implementation of OSPF 368 (Version 2) supports subnetting, by which an entire area can be 369 represented as a combination of network address and subnet mask. With 370 this approach, entire routing table gets reduced substantially. With 371 the removal of subnetting, all the nodes inside an area will have an 372 entry inside the routing table (OSPF Version 1). So the deterministic 373 factor is what is the maximum number of nodes inside an AS OSPF can 374 support once subnetting support gets removed. So the prefix length of 375 AS layer will be determined by this factor of OSPF. 377 With the introduction of hierarchy in the inter-AS layer, number of 378 entries in the BGP routing table will get reduced substantially. Even 379 if pA and pB both are selected as 16, number of routing entries come 380 within the admissible range of existing BGP protocol. But, it is the 381 responsibility of IANA to come out with a scheme how 382 nMaxInterASTopNodes and nMaxInterASBottomNodes are to be selected. 383 Each top level node will have nMaxInterASBottomNodes nodes. It will 384 be a waste of address space if each country gets assigned a top level 385 nodes (e.g. china has got a population of 1,306,313,800 people where 386 as Vatican City has got only 920 according to a census of 2006). So a 387 moderate value of nMaxInterASBottomNodes is desirable, with which 388 larger countries will have a number of top level nodes. e.g. each 389 state of USA can be assigned a top level node. With the introduction 390 of area in the inter-AS-bottom layer, each top level node can be 391 divided into number of areas of heterogeneous sizes. So, a group of 392 neighboring countries with less population can share the address 393 space of a top level node. Similarly, user-id space has to be decided 394 based on the largest area VLSM tree should be spanned through. All 395 these issues are completely geo political and have to be decided by 396 IANA. 398 3.2.1. A pseudo optimal distribution of prefixes in a 64 bit 399 architecture 401 In order to have optimal use of cable connections, length of the VLSM 402 tree is expected to be as short as possible. Also any single 403 organization may prefer to have its user id space to be under the 404 same network id. So, a 16 bit user-id may become insufficient for 405 places like large university campus, where as 32 bit will become too 406 large. Hence, 24 bit user-id will be a moderate one which is the 407 class A address space in IPv4 (also used as the space for private 408 IP). As published in 1998 [6], OSPF can support an area with 1600 409 routers and 30K external LSAs. So, 11 bits are needed to support this 410 space. With the assumption that OSPF can support much more address 411 space with the advancement of hardware technology as well as to keep 412 the space open for future expansions, 12 bits are assigned for the AS 413 layer. 16 bits are assigned for the inter-AS-bottom layer. So, if on 414 the average, 16 bit equivalent space gets used within the user-id 415 space (i.e. one out of 256) and 8 bit equivalent nodes gets used 416 inside an AS (16% of 1600), for a top level node (with 16 bit 417 equivalent AS nodes), it will generate 2^40 IP addresses, which will 418 give 8629 IP addresses per person in Japan (with a population of 419 127417200; Japan is at the 10th position from the top in the 420 population list of the world). So, even if all the countries with 421 population less than or equal to Japan are assigned a top level node 422 and all the provinces/states of countries with larger population are 423 assigned a top level node each, total number of nodes will come well 424 under 1024. If a number of neighboring countries with lesser 425 population shares a top level node, total number of top level nodes 426 will come down further. This suggests that 62 bit equivalent 427 (10(pA)+16(pB)+12(pC)+24(user-id)) space will be good enough for 428 unicast addresses. This distribution expects OSPF to support 65K 429 (64K+1K) external LSAs. 431 Distribution of address space will be finalized based on the 432 consultation with IANA. Primarily, they may appear to be as follows: 434 64 bit address space may be divided into two 63 bits blocks: 436 i. Global unicast addresses with the most significant bit set to 0. 437 This space is equally divided between provider assigned (PA) address 438 space and provider independent (PI) address space. 440 a) Provider assigned address space with prefix 00. 442 b) Provider independent (PI) address space with prefix 01. Provider 443 independent address space will be used for the customers who would 444 like to retain their number even after changing their providers. As 445 routing will be based on PA addresses, each PI address will be 446 associated to at least one PA address. Most significant part of PI 447 addressing is, it is independent of the architectural framework of 448 the provider network; even if the architectural framework changes, 449 same format of PI addressing can be maintained. Once implemented, PI 450 address of a node will be the number that will be generally used by 451 the common people. Section 5 describes issues related to PI 452 addressing in detail. 454 ii. Address space with the MSB set to 1 will be distributed within 455 the rest. Each of them will have a fixed prefix. This distribution 456 will be based on the requirements and the work that have already been 457 done in connection to IPv6: 459 a) Address space for multicasting with a prefix set to 1001. 461 b) Address space for link-local address: Link local addresses will 462 have a prefix 1010. 464 c) Router address space: Prefix 1111 will be used by routers inside 465 VLSM trees and 1110 will be used by backbone routers connecting them. 467 d) Address space for private IP: Each customer network can maintain 468 private address space to communicate within its users. This space 469 will be distributed within all the customer sites of a corporate that 470 can maintain VPN services. A 32 bit address space should be good 471 enough for private IP. Private address space will have a 32 bit 472 prefix with leading 4 bits are set to 1100 and the rest are set to 1. 474 Rest of the address space has been kept for future use. 476 3.2.2. Whether to go for a two-tier or three-tier hierarchy 478 Establishment of hierarchy in the inter-AS layer reduces the size of 479 BGP entries to a great extent, but leads to an improper use of 480 address space due to geo-political reason. If hierarchy in the inter- 481 AS space gets removed, entire 26 bit (10+16) space will be available 482 for a single layer and use of inter-AS space will be true to its 483 sense, but will increase external LSA (and/or number of entries in 484 the BGP table) dramatically. So, it depends on to what extent OSPF 485 can support external LSAs. BGP expects the packet length to be 486 limited to 4096 bytes. BGP manages to make it work with this 487 limitation with the concept of prefix reduction in the CIDR based 488 environment. As the number of inter-AS nodes increases, BGP has to 489 change this limit in order to make it work in flat address space. The 490 alternate will be to divide the inter-AS space into number of areas 491 as defined in section 2.1. The area border routers will advertise the 492 aggregated information to the rest of the world. BGP may have to 493 incorporate both the options at the same time. As the number of nodes 494 in the inter-AS layer increases, in order to reduce the number of 495 entries in the routing table, inter-AS space has to be split into two 496 separate planes. So, two-tier hierarchy can be considered as an 497 interim state to go for three-tier hierarchy. If it so happen that 498 current available data is good enough to support the present need, it 499 will be worth to look for to what extent it can support in the 500 future. Assignment of inter-AS nodes in two-tier hierarchy should be 501 based on the geographical distribution as if it is part of three-tier 502 hierarchy. Otherwise, introduction of three-tier hierarchy in the 503 future will become another difficult task to go through. Based on the 504 report of year 2011, BGP supports ~400,000 entries in the routing 505 table. With this growing trend, BGP may have to change the limit of 506 packet length even in a CIDR based environment. With the introduction 507 of two-tier hierarchy, number of entries in the routing table will 508 come down drastically and with the three-tier approach, it will come 509 down further. 511 3.3. Issues related to Satellite communications 513 Establishment of hierarchy in the inter-AS layer expects the only way 514 any two autonomous systems in two different top level nodes 515 communicate is through their SBRs. If two autonomous systems inside 516 the same top level node communicate through satellite, it will be 517 considered as a direct link between them. Whenever autonomous system 518 'ASa' of top level node 'A' communicates with autonomous system 'ASb' 519 of top level node 'B' through satellite, they have to go through 520 their state border routers. i.e. satellite port inside 'A' that 521 communicates with a satellite port inside 'B' will be considered as 522 state border router. If multiple such ports exists inside node 'A', 523 all of them will be equidistant from any port inside 'B'. Which 524 expects any satellite port inside 'B' to have prior knowledge of list 525 of autonomous systems that will be under the purview of any port 526 inside 'A'. So, all the satellite ports of 'A' have to exchange such 527 group of information with all the satellite ports of 'B' and vice 528 versa. These group of autonomous systems can be considered as a 529 cluster of autonomous systems inside an area of a top level node. If 530 number of such ports is small, some heuristics can be applied while 531 assigning AS numbers in order to reduce the processing time during 532 the circuit establishment phase. It will become difficult to 533 maintain such heuristics once the number of such ports becomes large. 534 So, in case of satellite communication, the advantage of establishing 535 hierarchy inside inter-AS layer diminishes as the number of satellite 536 ports increases. If any private corporate maintains its own satellite 537 channel to communicate between its offices at distant locations, all 538 of these offices are going to be considered as under the user-id 539 space of its network. Service providers that provide satellite 540 services to the end-site customers, can operate in the usual manner 541 as they will provide connection to customer networks which will act 542 as stub. 544 4. Provider Independent addressing, name services and multihoming 546 Provider independent addressing can be conceived as naming a host 547 with a number. It can be used by customer networks who would like to 548 retain their number even after changing their service provider; also 549 it is useful to designate a host uniquely if the customer network is 550 multihomed. Just like in name services, as address corresponding to a 551 name needs to be resolved first to initiate communication, the same 552 is required for PI addressing. Each globally unique PI address will 553 be associated to at least one global unicast provider assigned 554 address. For a host with single interface, this number will be same 555 as the number of service providers the customer network is associated 556 with. 558 As either source or destination or both may be multihomed, there 559 could be multiple paths to communicate between two hosts. This is 560 required both for name services as well as for PI addressing. 562 A system call needs to be introduced to get the source address based 563 on the destination address. If application program needs to use the 564 destination address directly, it needs to use this system call. 566 int getcommaddr(int sockfd, struct in_addr *dst, struct addr_pair 567 *endpts); 569 'addr_pair' holds the addresses of communication end points as 570 follows: 572 struct addr_pair { 573 struct in_addr src; 574 struct in_addr dst; 575 }; 577 'getcommaddr'[10] returns the number of source-destination pairs for 578 communication; the field 'endpt' will hold the array of these 579 addresses. The array will be in sorted manner based on the best 580 possible route. 'sockfd' is used to get the 'type of service' 581 assigned. So, an application program needs to set its type of service 582 before using this call. 584 'getcommaddr needs to call a routine 'getmappedaddr' to resolve the 585 mapped provider assigned addresses of a provider independent address. 587 int getmappedaddr(struct in_addr *piaddr, struct in_addr *mpiaddr); 589 'getmappedaddr' will return number of mapped addresses and 'mpiaddr' 590 will hold their values. 592 "Host Identification with Provider Independent Address"[12] resolves 593 provider assigned addresses corresponding to a provider independent 594 address. 596 Users may use name instead of IP address to reach the destination. A 597 new system call needs to be introduced 'gethostbynamewithsrcaddr', 598 which is an extension to 'gethostbyname' as follows: 600 struct hostent *gethostbynamewithsrcaddr(int sockfd,const char *name, 601 int *nroutes, struct addr_pair *endpts); 603 'gethostbynamewithsrcaddr'[10] takes 'name' and 'sockfd' as input 604 parameters and finds out the best possible route to reach the 605 destination. It returns the pointer to the 'hostent' structure as 606 returned by 'gethostbyname' system call. The parameter 'nroutes' 607 gets the number of possible routes to be used and the corresponding 608 source and destination addresses gets assigned to 'endpts' in sorted 609 manner. 'sockfd' is used to get the 'type of service' assigned. So, 610 an application program needs to set its type of service before using 611 this call. 613 An application program needs to use these source addresses from the 614 top (i.e. the 0th) to establish connection with the destination. It 615 needs to bind source address 'src' and then connect with the 616 destination address 'dst'. 618 5. Issues related to IP mobility 620 An interface of a customer network may have several IP addresses 621 (e.g. for a multihomed customer site, each interface will have 622 multiple global unicast addresses also it may have private 623 addresses). For a mobile node that has been moved to a customer 624 network which gets service from a service provider and maintains 625 private IP addresses, will have at least three IP addresses; provider 626 assigned unicast address, private address and its permanent "Home 627 Address". The "Home Address" will be aliased with the provider 628 assigned address (i.e. the co-located care-of address). So the 629 interface structure needs to have an additional field to hold the 630 value of care-of address. The PCB structure will have an additional 631 field 'inp_lcladdr'. So 'inp_lcladdr' will have the current provider 632 assigned address that a foreign node needs to use for communication. 633 The field 'inp_laddr' that is used to hold the value of local address 634 will hold the value of "Home Address" of a mobile node. Similarly, 635 PCB needs to introduce another field 'inp_fcladdr' to support the 636 destination address to be mobile. The existing field 'inp_faddr' 637 which is used to address a foreign address will hold the value of 638 "Home Address" of the mobile node. Customers with PI address who 639 would like to have mobility support, the mapped address will be 640 considered as the "Home Address" of the mobile node. 642 An outgoing packet from a mobile node in a foreign site needs to be 643 stacked with the associated care-of address. While initiating 644 communication, the 'bind' system call needs to go through the 645 interface list and fetch the associated structure to check whether 646 the source address is aliased or not and needs to fill the value of 647 'inp_lcladdr' of PCB accordingly. 649 When TCP receives a SYN for connection establishment, it allocates a 650 PCB and assigns the values for 'inp_laddr', and related fields. 651 During this phase, TCP also needs to check whether the local address 652 is aliased or not (based on the fields of interface structure; which 653 is applicable for a mobile node at foreign site) and needs to fill 654 the values of 'inp_lcladdr' accordingly. Similarly if destination 655 address is found to be aliased, based on the stacking type, it needs 656 to fill up the field 'inp_fcladdr'. 658 IP address stacking can be performed with the approach introduced in 659 section 6.4 of RFC6275[7]. RFC6275 talks about the stacking of IP 660 addresses for a destination address (Let us call it as type 0 661 stacking). Two more types of stacking need to be introduced; type 1 662 stacking where only source address will appear in the stack and type 663 2 stacking where both source address and destination address will 664 appear in the stack with a particular type of ordering. 666 Protocol output routine like 'tcp_output' or 'udp_output' needs to 667 fill the IP packet in the following manner. 669 If the socket contains a valid 'inp_lcladdr', use 'inp_lcladdr' as 670 the source address and 'inp_laddr' will appear in the stack. If the 671 socket contains a valid 'inp_fcladdr' use 'inp_fcladdr' as the 672 destination address and 'inp_faddr' will appear in the stack. If only 673 'inp_fcladdr' contains a valid address where as 'inp_lcladdr' is 674 NULL, use type 0 stacking. If only 'inp_lcladdr' contains a valid 675 address where as 'inp_fcladdr' is set as NULL, use type 1 stacking. 676 If both 'inp_lcladdr' and 'inp_fcladdr' contains valid addresses, use 677 type 2 stacking. 679 Protocol input routine like 'tcp_input' or 'udp_input' needs to 680 process the packet in the reverse order based on the type of 681 stacking. For type 0 stacking, use the address in the stack as the 682 destination address; for type 1 stacking, use the address in the 683 stack as the source address; for type 2 stacking use both source 684 address and destination address from the stack. 686 5.1. Changes expected with the specifications related to IP mobility 688 RFC6275 demands correspondent node binding from mobile nodes for 689 route optimization. This binding is required when a connection gets 690 established as well as when the mobile node changes it address space. 691 There are application like HTTP which opens up multiple connections 692 on the run time which are very short lived. If mobile nodes need to 693 send binding messages for all the connections, network will be 694 unnecessarily congested. This congestion can be avoided with the 695 establishment of binding at the time of connection establishment 696 itself. So, if TCP server happens to be mobile, it will set the 697 value of 'inp_lcladdr' in the stack while sending SYN+ACK. TCP client 698 which initiates communication through 'connect' needs to set 699 'inp_fcladdr' field on receiving TCP+ACK. With this approach 700 correspondent node binding messages need to be sent only when a 701 mobile node changes its position from one address space to another. 703 Route optimization is not applicable to applications which are of 704 multicast type. In these cases packets need to be forwarded with the 705 mechanism of reverse tunneling with the approach of "IP Encapsulation 706 within IP" as defined in RFC2003. In order to support packet 707 delivery with route optimization method as well as with 708 "Encapsulating Delivery Style" based on the application type the 709 protocol control block needs to introduce another field 710 'inp_hagentaddr' to hold the address of the home agent of the mobile 711 node. The interface structure also needs to have same field. The 712 'bind' system call needs to go through the interface list to fetch 713 'inp_hagentaddr' to the PCB along with 'inp_lcladdr' as described 714 earlier. So, protocol output routines like 'tcp_output', 'udp_output' 715 need to fill up the packets based on the application type. In 716 "Encapsulating Delivery Style" packets need to be formed in the 717 following manner. 719 The inner IP header will contain 720 Source Address: Home address of the mobile node 721 (i.e. 'inp_laddr') 722 Destination address: Address of the correspondent node 723 (i.e. 'inp_faddr') 724 The outer IP header will contain 725 Source Address: co-located care of address of the mobile node 726 (i.e. 'inp_lcladdr') 727 Destination Address: Address of the home agent of the mobile node 728 (i.e. 'inp_hagentaddr') 729 Protocol field: IP in IP 731 6. Refinements over existing IPv6 specification 733 As IPv6 was envisioned long before some of the newer technologies 734 e.g. MPLS came into picture, some refinements can be made over the 735 existing specification. These considerations are related to bandwidth 736 usages and performance inside switches. Experimental results show 737 that smaller packet size gives better result for the processing of RT 738 packets. So, it is desirable to have IP packet header to be as small 739 as possible. 741 As described earlier, evaluation of the parameters 742 nMaxInterASTopNodes, nMaxInterASBottomNodes and nMaxASNodes is geo- 743 political and have to be decided by IANA. Once these parameters are 744 determined with mutual agreements, values of pA, pB, pC and prefix 745 length of user id can be determined. With 64 bit address space, IP 746 header will be reduced by 16 bytes. 748 The 'flow label' field of IPv6 packet header may not be of any use 749 with MPLS is in use. ATM used to have 4 priority classes. The first 750 specification of IPv6 RFC-1883 used a 4 bit type of service field 751 along with a 24 bit flow label field. These two were modified to a 8 752 bit type of service field and a 20 bit flow label field in the 753 current spec RFC-2460. Too many priority classes may increase 754 complexities to process inside switches. If type of service field of 755 IPv6 header may be reduced to be of 4 bit length as it was stated in 756 RFC-1883 and 'flow label' field gets removed, another three bytes may 757 be reduced from the IPv6 header. 759 The field 'Hop Limit' has got a 8 bit value in the existing spec. The 760 role of this field needs to be discussed properly with a large 761 address space. 763 RFC4862[8] introduces the concept of "Stateless auto configuration" 764 with the goal in mind that no manual configuration is required by 765 individual machines before connecting them to the network. It 766 generates a link local address with a link-local prefix and the link 767 address (e.g. Ethernet/E.164 for ISDN) first. This link local address 768 is used to configure global unicast address and any other 769 configurable parameters based on router advertisement. Global 770 unicast addresses are generated by the prefix supplied by the router 771 advertisement and the link specific interface identifier. This 772 identifier can be as large as 64 bit length. So irrespective of the 773 size of the network (it may be 10000 or 100 or even less than that) 774 every subnet of a customer network will consume a 64 bit equivalent 775 addresses. This seems to be a huge blunder. What is expected is the 776 length of the interface identifier is equivalent to support the 777 number of nodes supported by that subnet. In order to achieve this, 778 the router itself or a server in that subnet needs to maintain a 779 storage which will generate the interface identifier based on the 780 request from individual hosts. It may be desirable that interface 781 identifiers are generated from DHCP servers. With the option of 782 generating interface identifier through DHCP, changes in the auto 783 configuration process can be looked at as follows: 785 From the point of view of a host, it can be considered as a two step 786 process. Host needs to send Router Solicitations message to find out 787 the presence of a router. Router Advertisement message should include 788 an option field which will inform whether prefix information should 789 be configured through Router Advertisement or through DHCP. Host 790 needs to send a request message to get the interface identifier. If 791 both the information needs to be obtained from a DHCP server they can 792 be obtained through a single message. 794 From the server's point of view, it needs to maintain a database for 795 a mapping of the link-layer address and subnet specific interface 796 identifier. Lifetime of an interface identifier has to be processed 797 in the usual manner the way existing DHCP implementation treats IP 798 addresses. 800 There seem to be another possible danger to obtain prefix information 801 through Router Advertisement. As the Router Advertisement comes in 802 the form of ICMP messages, once it is received by the ICMP layer, it 803 looses information from which interface the message has been received 804 (This problem arises for hosts that are having multiple interfaces 805 and not all of them are attached to the same subnet). So, auto 806 configuration of a host has to be performed one interface at a time 807 by making all other interfaces disabled. Once configuration of all 808 the interfaces are done, all of them have to be enabled. 810 If it is expected that hosts should reconfigure their addresses 811 dynamically based on Router Advertisement message, Router 812 Advertisement needs to generate a special message for a certain 813 amount of time that needs to include old prefix and the corresponding 814 new prefix in the message. 816 In order to support multihoming[10], prefix information needs to 817 include the fields 'default router' and 'next hop address' to reach 818 the default router for each of the prefixes. 820 In a 64 bit architecture, link-local address can be formed with a 821 link-local prefix and link-layer address in a suitable manner; say it 822 can be formed with a 4 bit link-local prefix followed by a 60 bit 823 link-layer address. IPv6 supports Modified EUI-64 format for hardware 824 that supports 48 bit addressing by inserting a padding of 16 bit (FF 825 FE) in between company_id and manufacturer selected extension 826 identifier. In order to make things work, this padding has to be 827 reduced to 12 bit. For hardware that support E.164 format, uses a 15 828 digits number in BCD format followed by a padding of four bits set to 829 1111. Thus in this case, link local address can be formed with the 830 link-local prefix followed by the most significant 60 bit of E.164 831 format. 833 Section 3.1 of RFC 7421[9] states "It is sometimes suggested that 834 assigning a prefix such as /48 or /56 to every user site (including 835 the smallest) as recommended by [RFC6177] is wasteful. In fact, the 836 currently released unicast address space, 2000::/3, contains 35 837 trillion /48 prefixes ((2**45 = 35,184,372,088,832), of which only a 838 small fraction have been allocated. Allowing for a conservative 839 estimate of allocation efficiency, i.e., an HD-ratio of 0.94 840 [RFC4692], approximately 5 trillion /48 prefixes can be allocated. 841 Even with a relaxed HD-ratio of 0.89, approximately one trillion /48 842 prefixes can be allocated. Furthermore, with only 2000::/3 currently 843 committed for unicast addressing, we still have approximately 85% of 844 the address space in reserve. Thus, there is no objective risk of 845 prefix depletion by assigning /48 or /56 prefixes even to the 846 smallest sites." 848 So, each customer network can be assigned a /48 prefix, i.e 80 bits 849 address space. 851 In IPv4, class A(24 bits), class B(16 bits) and class C(8 bits) 852 networks were classified with the thoughts in mind that there will be 853 very few large networks (class A), a large number of mid sized 854 networks (class B) and a very large number of small sized networks 855 (class C). If we go back to the assignment of address space in IPv4, 856 before the emergence of CIDR, class B address space were getting 857 exhausted very fast. Moreover, it was realized that 16 bits class B 858 address space is way too large compared to the requirement of most of 859 the mid sized networks [2]. So, if we look at the actual need of 860 customer networks, on the average, it needs less than 16 bits (say, m 861 bits) address space. 863 So, if 80 bits address space is used for each customer network in 864 IPv6, more than 64 bits will remain unused on the average. In effect, 865 out of 128 bits, less than 64 bits will be of actual use. i.e. if RFC 866 7421 justifies 128 bits address space as good enough for the need of 867 this world, 64 bits address space will satisfy the need of this world 868 when customer networks are assigned address space based on their 869 sizes. 871 Where ever one network gets satisfied with 80 bits address space 872 based on RFC 7421, 2^(16-m) networks get satisfied with 16 bits 873 address space if customer networks are assigned address space based 874 on their sizes. If total M networks with /48 prefixes can be 875 satisfied with 128 bits address space based on RFC 7421, total 876 M*2^(16-m) networks will be satisfied with 64 bits address space once 877 networks are assigned address space based on their sizes. 879 7. Distributed processing and Multicasting 881 With the inherent hierarchy involved in this architecture, 882 distributed applications can also be structured in a suitable manner. 883 Say, for a commonly used web based application a master level server 884 will be there at every top level node. Any change that might happen 885 in the application, has to be synchronized within these master level 886 servers first. There might be servers at the middle layer (inside 887 each inter-AS-bottom) inside each top level node. Once the changes 888 get reflected at the master node, all the servers at the middle layer 889 needs to update themselves with their master level node. This will 890 reduce network traffic substantially. Inherent hierarchy in the 891 architecture will also help establishing multicast tree in the 892 similar manner. Work on these issues can be progressed only after 893 this architecture gets approved. 895 8. Transition to real IP from private IP 897 Both CIDR and mesh structured hierarchy expects a VLSM tree at the 898 bottom. In VLSM, in real IP space with provider assigned (PA) 899 addresses, assignment of network resources has to be associated with 900 the address space to be used with the type of service. Within a 901 typical switch supporting multiple types of ports, a line card of 902 strength OC48 can be replaced with 4 line cards of strength OC12. An 903 OC12 card may also be replaced with 4 OC3 cards. An OC12 card may be 904 attached to another switch with DS3 ports and so on. When it reaches 905 to the customer network port density of a switch has to be directly 906 proportional to the address block that a customer network will be 907 assigned to. i.e. each customer network has to be assigned a block of 908 address space (say, 128, 256, 512, 1K, 2K etc). Within the switch 909 these ports have to be assigned net address/net mask the way VLSM 910 works. 912 In IPv4 environment, providers have provided services in terms of 913 bandwidth of the ports say, 2 Mbps/4 Mbps/1 Gbps line etc. If these 914 ports were assigned addresses based on the number of users of the 915 customer network, transition from private IP to real IP is simple. 916 Consider a switch that has supplied 2 Mbps line to a set of customers 917 with number of users within 1K to 2k, each of them will be assigned a 918 block of 2K each. But if number of users are not proportional to the 919 bandwidth used, say same 2 Mbps line were used to customers of sizes 920 1K, 2K 4K and 16K respectively reorganization will be needed if 921 possible. This rearrangement may be possible within the switch itself 922 or by connecting ports of appropriate sizes from different switch, 923 otherwise each of them has to be assigned an address block of 16K 924 each or with the way VLSM works whatever is suitable. So, address 925 block assignment in the VLSM tree has to grow in a bottom up 926 approach. 928 Thus, transition of existing provider network without (or very 929 little) rearrangement to a real IP space with CIDR based approach is 930 apparently not a difficult job. In a CIDR based approach, sizes of 931 the VLSM trees are heterogeneous that leads to number of routing 932 entries to be very high. Mesh structured hierarchy is convenient to 933 reduce the routing overhead as well as for distribution of network 934 resources in a suitable manner in the long run. To covert CIDR based 935 approach to mesh structured hierarchy requires reorganization mainly 936 in the routing domain and by splitting trees of very large sizes (>24 937 bit address space) at the top. 939 Mesh structured hierarchy makes use of a large address space and 940 distributes the entire space into some regions and sub regions inside 941 each region by maintaining flat address space in each layer for the 942 convenience of routing and distribution. It shows that 64 bit address 943 space is good enough for all practical purposes. If address space 944 gets assigned based on the actual need of the customer networks, 945 there will be lots of unused address space within 64 bit address 946 space. If CIDR based hierarchy is maintained, unused address space 947 will be much higher. 949 9. IANA Consideration 951 This document does not include any IANA related issues. 953 10. Security Consideration 955 This document does not include any security related issues. 957 11. Acknowledgments 959 The author would like to thank to Professor Amitava Datta of 960 University of Western Australia for his review and constructive 961 comments. 963 12. Normative References 965 [1] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for 966 IPv6 Hosts and Routers", RFC 4213, October 2005. 968 [2] Fuller V., Li. T., "Classless Inter-Domain Routing (CIDR): The 969 Internet Address Assignment and Aggregation Plan", RFC 4632, 970 August 2006. 972 [3] Huston, G., "Commentary on Inter-Domain Routing in the 973 Internet", RFC 3221, December 2001. 975 [4] Q. Vohra, E. Chen., "BGP Support for Four-octet AS Number 976 Space", RFC 4893, May 2007. 978 [5] Srisuresh, P. and K. Egevang, "Traditional IP Network Address 979 Translator (Traditional NAT)", RFC 3022, January 2001. 981 [6] J. Moy., "OSPF Standardization Report", RFC 2329, April 1998 983 [7] C. Perkins, Ed., D. Johnson, J. Arkko, "Mobility Support in 984 IPv6" RFC 6275, July 2011. 986 [8] S. Thomson, T. Narten, T. Jinmei, "IPv6 Stateless Address 987 Autoconfiguration", RFC 4862, September 2007. 989 [9] B. Carpenter, Ed., T. Chown, F. Gont, S. Jiang, A. Petrescu, 990 A. Yourtchenko, "Analysis of the 64-bit Boundary in IPv6 991 Addressing", RFC 7421, January 2015. 993 [10] S. Bandyopadhyay, "Solution for Site Multihoming in a Real IP 994 Environment", , work in progress. 996 [11] S. Bandyopadhyay, "VLSM Tree Routing Protocol", 997 , work in progress. 999 [12] S. Bandyopadhyay, "Host Identification with Provider Independent 1000 Address" , work in progress. 1002 13. Informative References 1004 [13] Postel, J., "Internet Protocol", STD 5, RFC 791, 1005 September 1981. 1007 [14] Rekhter, Y., and T., Li, "A Border Gateway Protocol 4 (BGP- 1008 4)",RFC 1771, March 1995. 1010 [15] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) 1011 Specification, RFC 1883, December 1995. 1013 [16] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) 1014 Specification", RFC 2460, December 1998. 1016 [17] Rosen, E., Viswanathan, A. and R. Callon, "Multiprotocol 1017 Label Switching Architecture", RFC 3031, January 2001. 1019 14. Author's Address 1021 Shyamaprasad Bandyopadhyay 1022 HL No 205/157/7, Kharagpur 721305, India 1023 Phone: +91 3222 225137 1024 e-mail: shyamb66@gmail.com