idnits 2.17.1 draft-shyam-real-ip-framework-39.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 605 has weird spacing: '...lent to the a...' -- The document date (July 23, 2017) is 2468 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: '16' is defined on line 1369, but no explicit reference was found in the text == Unused Reference: '17' is defined on line 1372, but no explicit reference was found in the text == Unused Reference: '18' is defined on line 1375, but no explicit reference was found in the text == Unused Reference: '19' is defined on line 1378, but no explicit reference was found in the text == Unused Reference: '20' is defined on line 1380, but no explicit reference was found in the text == Unused Reference: '21' is defined on line 1383, but no explicit reference was found in the text ** Obsolete normative reference: RFC 4893 (ref. '4') (Obsoleted by RFC 6793) ** Obsolete normative reference: RFC 5395 (ref. '12') (Obsoleted by RFC 6195) -- Obsolete informational reference (is this intentional?): RFC 1771 (ref. '17') (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 1883 (ref. '18') (Obsoleted by RFC 2460) -- Obsolete informational reference (is this intentional?): RFC 2460 (ref. '20') (Obsoleted by RFC 8200) Summary: 2 errors (**), 0 flaws (~~), 8 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET DRAFT S. Bandyopadhyay 3 draft-shyam-real-ip-framework-39.txt July 23, 2017 4 Intended status: Experimental 5 Expires: January 23, 2018 7 An Architectural Framework of the Internet for the Real IP World 8 draft-shyam-real-ip-framework-39.txt 10 Abstract 12 This document tries to propose an architectural framework of the 13 internet in the real IP world. It describes how a three-tier mesh 14 structured hierarchy can be established in a large address space 15 based on fragmenting it into some regions and some sub regions inside 16 each of them. It addresses issues which could be relevant to this 17 architecture in the context of IPv6. It shows how to make a 18 transition from private IP to real IP without making significant 19 changes with the existing network. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on January 23, 2018. 38 Copyright Notice 40 Copyright (c) 2017 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. 50 Table of Contents 51 1. Introduction.....................................................2 52 2. Background.......................................................3 53 3. A Three tier mesh structured hierarchical network................4 54 3.1. Route propagation...........................................5 55 3.2. Determination of prefix lengths.............................7 56 3.2.1. A pseudo optimal distribution of prefixes in 57 a 64bit architecture.................................8 58 3.2.2. Whether to go for a two tier or three tier hierarchy 59 .....................................................9 60 3.3. Issues related to Satellite communications.................10 61 4. Provider Independent addressing, name services and multihoming..11 62 4.1. PI address Resolution......................................12 63 4.1.1. Record Format.......................................16 64 4.1.2. Messages............................................17 65 4.1.3. Master file and data file...........................19 66 4.1.4. Zone maintenance and transfers......................21 67 5. Issues related to IP mobility...................................22 68 5.1. Changes expected with the specifications related 69 to IP mobility.............................................23 70 6. Refinements over existing IPv6 specification....................24 71 7. Distributed processing and Multicasting.........................26 72 8. Transition to real IP from private IP...........................27 73 9. IANA Consideration..............................................28 74 10. Security Consideration.........................................28 75 11. Acknowledgments................................................28 76 12. Normative References...........................................28 77 13. Informative References.........................................29 78 14. Author's Address...............................................29 80 1. Introduction 82 Transition from IPv4 to IPv6 is in the process. Work has been done to 83 upgrade individual nodes (workstations) from IPv4 to IPv6. Also, 84 there are established documents to make routers/switches to work to 85 support IPv4 as well as IPv6 packets simultaneously in order to make 86 the transition possible [1]. CIDR[2] based hierarchical architecture 87 in the existing 32-bit system is supposed to be continued in IPv6 too 88 with a large address space. There are documents/concerns over BGP 89 table entries to become too large in the existing system [3]. There 90 are proposals to upgrade Autonomous System number to 32-bit from 91 16-bit to support the demand at the same time [4]. The challenge 92 relies on how to make the transition smooth from IPv4 to a real IP 93 world with least changes possible. 95 The term "real IP environment" is referred to an environment where 96 hosts in a customer network will possess globally unique IP addresses 97 and communicate with the rest of the world without the help of 98 NAT[5]. This document reflects changes required with the BSD 4.4 99 source code where ever applicable. 101 2. Background 103 Existing system is in work with Autonomous System (AS) and inter-AS 104 layer with the approach of CIDR. In order to meet the need within the 105 32-bit address space, Autonomous Systems of various sizes maintain 106 CIDR based hierarchical architecture. With the help of NAT [5], a 107 stub network can maintain an user ID space as large as a class A 108 network and can meet its useful need to communicate with the rest of 109 the world with very few real IP addresses. With the combination of 110 CIDR and NAT applied in the entire space, most of the part of 32-bit 111 address space gets effectively used as network ID. If the same gets 112 continued with a larger network ID, load in the switches will become 113 too high. 115 With traditional CIDR based hierarchy, a node of higher prefix can be 116 divided into number of nodes with lower prefixes. Each divided node 117 can further be subdivided with nodes of further lower prefixes. This 118 process can be continued till no further division is possible. The 119 point worth noting is at each point the designer of the network has 120 to preconceive the future expansion of the network with the concept 121 in the mind that the resource can not be exhausted at any point of 122 time. This phenomenon leads the designer to allocate resources much 123 higher than whatever is needed which leads to a space of unused 124 address space and the concept of H-D (host-density) ratio comes into 125 play. The problem gets aggravated once resource gets exhausted by any 126 chance. e.g. a node of prefix /16 can be divided with a number of 127 nodes of prefixes /24. If any one of the nodes /24 gets exhausted, 128 resources of other nodes of prefixes /24 can not be used even if they 129 are available. 131 In IPv4 environment, there is a desperate attempt of the service 132 providers to provide internet services with the help of NAT. e.g. a 133 large educational institute meets its current requirement with 4 real 134 IP addresses; one for its mail server, one for its web server, one 135 for its ftp server and another one for its proxy server to provide 136 web based services to all of its users. These four types of services 137 are used by any organization of any size(it may be 400 or even 138 40000). In the current provider network these organizations are 139 supported their need with 4 IP addresses and the CIDR based tree has 140 been built using these components together. When private IP will be 141 replaced with real IP, each customer network will require IP 142 addresses based on its size and requirement. Transitioning to real IP 143 space with provider assigned addresses with CIDR based approach 144 itself without reorganization of the existing provider network may 145 not be a difficult task. This will continue with all the problems 146 associated with routing and problems related to distribution. Mesh 147 structured hierarchy is convenient to reduce the routing overhead as 148 well as for distribution of network resources in a suitable manner in 149 the long run. 151 3. A Three-tier mesh structured hierarchical network 153 As Autonomous Systems of various sizes are supported, Autonomous 154 Systems and the nodes inside the Autonomous Systems can be viewed as 155 graphically lying on the same plane within the address apace. If 156 network can be viewed as lying on different planes, routing issues 157 can be made simpler. If network is designed with a fixed length of 158 prefix for the Autonomous System everywhere, routing information for 159 the rest will get confined with the other part of the network prefix. 160 Which means the maximum size of AS gets assigned to all irrespective 161 of their actual sizes. This can be made possible with the advantage 162 of using a large address space and dividing it into number of regions 163 of fixed sizes inside it. Thus entire network can be viewed as a 164 network of inter-AS layer nodes. Each node in the inter-AS layer can 165 act either only as a router in the inter-AS layer or as a router in 166 the inter-AS layer with an Autonomous System attached to it with a 167 single point of attachment or as an Autonomous System with multiple 168 Autonomous System border routers (ASBR) appearing like a mesh. Thus 169 two tier mesh structured hierarchy gets established between AS layer 170 and inter-AS layer with each AS having a fixed length of prefix. 172 Based on the definition of Autonomous System, it is a small area 173 within the entire network that maintains its own independent identity 174 that communicates with the rest of the world through some specific 175 border routers. In the similar manner, if a larger area (say region 176 or state) can be considered as network of Autonomous Systems, that 177 can maintain its own identity by communicating with the rest of the 178 world through some border routers (say, state border router), mesh 179 structured hierarchy can be established within the inter-AS layer. 180 The inter-AS layer will be split into inter-AS-top and inter-AS- 181 bottom. To maintain this hierarchy, each node of inter-AS-top needs 182 to have multiple regional or state border routers (say, SBR) through 183 which each one will communicate with the rest of the world in the 184 similar manner an Autonomous System maintains ASBR. Thus, entire 185 network will appear as a network of nodes of inter-AS-top layer. To 186 maintain hierarchy, each node of the inter-AS-top needs to have a 187 fixed length of prefix. i.e. each node of the inter-AS top will be 188 assigned a maximum (fixed) number of nodes of Autonomous Systems. 190 Thus, with three-tier mesh structured hierarchy in the network layer, 191 network ID can be viewed as A.B.C. If pA, pB and pC be the prefix 192 lengths of inter-AS-top, inter-AS-bottom and AS layers respectively, 193 there will be 2^pA nodes at the topmost layer, 2^pB at the inter-AS- 194 bottom layer and 2^pC nodes at the AS layer. Thus the entire space 195 gets divided into a fixed number of regions and each region gets 196 divided into fixed number of sub regions. This division is supposed 197 to be made based on geography, population density and their demands 198 and related factors. 200 Let nMaxInterASTopNodes be the possible maximum number of nodes 201 assigned at the top most layer and nMaxInterASBottomNodes be that at 202 the inter-AS-bottom layer and nMaxASNodes at the AS layer. Where 203 nMaxInterASTopNodes <= 2^pA and nMaxInterASBottomNodes <= 2^pB and 204 nMaxASNodes <= 2^pC. 206 3.1. Route propagation 208 With hierarchy established, routing information that gets established 209 inside a node of inter-AS-top, does not need to be propagated to 210 another node of inter-AS-top. Entire routing information of inter-AS- 211 top layer needs to be propagated to inter-AS-bottom layer. So, each 212 router of inter-AS layer will have two tables of information, one for 213 the inter-AS-top and another for the inter-AS-bottom of the inter-AS- 214 top node that it belongs to. BGP (with little modification) will work 215 very well with a trick applied at the SBRs. Each SBR will not 216 propagate the routing information of inter-AS-bottom layer of its 217 domain to another SBR of neighboring domain. i.e. SBR of one top 218 layer node will propagate routing information only of inter-AS-top 219 layer to SBR of another top layer node. Inside a node of inter-AS- 220 top, routing information of inter-AS-top and inter-AS-bottom need to 221 be propagated from one ASBR to another neighboring ASBR. Inside a top 222 layer node A, routing information of another top layer node B will 223 have two parts; one for the list of SBRs through which a packet will 224 traverse from top layer node A to B and another for the list of ASBRs 225 through which the packet will traverse from one AS to another inside 226 A. In terms of BGP, AS_PATH attribute will be split into two parts; 227 one for the information of the top layer and another for the bottom 228 layer. Within the same node A routing information of one AS to 229 another AS will not have any top layer information. i.e. the top 230 layer information will be set to as NULL. 232 Similarly, each node of the AS layer will have three tables of 233 routing entries. One for the inter-AS-top, one for the inter-AS- 234 bottom and another for the routing information inside the Autonomous 235 System itself. 237 Introduction of hierarchy at the inter-AS layer reduces the size of 238 the routing table substantially. With the availability of hardware 239 resources if flat address space is maintained at each layer, problems 240 related to CIDR can be avoided. With flat address space, no 241 hierarchical relationship needs to be established between any two 242 nodes in the same layer. So, all the nodes inside each layer can be 243 used till they get exhausted. With flat address space (i.e. without 244 prefix reduction), BGP tables will have maximum nMaxInterASTopNodes + 245 nMaxInterASBottomNodes entries. 247 IGP like OSPF has got provision to divide AS into smaller areas. OSPF 248 hides the topology of an area from the rest of the Autonomous System. 249 This information hiding enables a significant reduction in routing 250 traffic. With the support of subnetting, OSPF attaches an IP address 251 mask to indicate a range of IP addresses being described by that 252 particular route. With this approach it reduces the size of the 253 routing traffic instead of describing all the nodes inside it, but 254 introduces another level of hierarchy. If subnetting concept can be 255 avoided from the AS layer(with the additional overhead of computation 256 inside the SPF tree), each area can be configured from a free pool of 257 addresses based on its requirement dynamically. So, an AS can be 258 divided into number of areas of heterogeneous sizes with the nodes 259 from a free pool of address space. 261 Similarly, the concept of area can be introduced in the inter-AS- 262 bottom layer the way it works in OSPF. The area border routers in the 263 inter-AS-bottom layer have to behave exactly in the similar manner 264 the way an ABR behaves in OSPF. i.e. an area border router will hide 265 the topology inside an area to the rest of the world and will 266 distribute the collected information inside the area to the rest. It 267 will distribute the collected routing information from outside to the 268 nodes inside as well. In order to implement this, protocol running in 269 the inter-AS layer (say BGP) will have to introduce a 'cost' factor. 270 This cost factor can be interpreted as the cost of propagation of a 271 packet from one AS to another. The protocols running inside AS layer 272 (RIP/OSPF, etc) will have to the supply the cost information for a 273 packet to travel from one ASBR to another. All the protocols must 274 behave in unison for supplying this information. The cost factor is 275 needed for a remote node while sending a packet to a node inside an 276 area while more than one area border routers are equidistant from 277 that remote node. Thus inter-AS-bottom layer (i.e. one inter-AS-top 278 level node) can be divided into number of areas of heterogeneous 279 sizes with nodes of AS from a free pool of address space. BGP adopts 280 a technique called route aggregation. Along with route aggregation it 281 reduces routing information within a message. In the similar manner, 282 introduction of area inside inter-AS-bottom layer will not only 283 reduce the complexity of the protocol, but will reduce the size of a 284 BGP packet substantially. 286 With this architecture, each node(router) inside an AS is represented 287 as A.B.C. Each node may or may not be attached with a network which 288 acts as a leaf node (i.e. a network will not act as a transit). In 289 order to make use of user-id space properly and to support customer 290 networks of heterogeneous sizes, the user-ID space needs to be 291 divided as subnet-ID and user-ID. Profoundly, a VLSM (variable length 292 subnet mask) type of approach has to be adopted at each node of an 293 AS. So, each node of the AS layer will act as the root of a tree 294 whose leaves are independent small customer networks which will act 295 as stub. As the routing information of inter-AS layer as well as AS 296 layer need not be passed inside any node of the VLSM tree, each 297 router inside the tree should maintain default route for any address 298 outside of its network. With this approach, load on each router of 299 the service providers will become negligible. Protocols that supports 300 VLSM with MPLS/VPN has to be implemented inside the tree (inside the 301 VLSM tree, all the physical ports of a switch have to be configured 302 with the subnet mask. So, mere MPLS on top of static routing table 303 should do the rest). 305 The fundamental assumptions based on which this architecture lies can 306 be summarized as follows: 308 i) Entire network can be viewed as a network of regions or states 309 where each region or state can have its own identity by communicating 310 with the rest of the world through some state border routers. Each 311 region or state is a network of Autonomous Systems. Each region as 312 well as each Autonomous System inside them will have a fixed 313 (maximum) length of prefix. 315 ii) Availability of hardware resources is such that flat address 316 space can be maintained at the inter-AS layer. 318 Introduction of mesh-structured hierarchy will have several 319 advantages: 321 o Load at each router will get reduced substantially. 322 o Concept of CIDR style approach and complexity related to 323 prefix reduction can be easily avoided. 324 o Mesh structured hierarchy will make traffic evenly distributed. 325 o Physical cable connection can be optimized. 326 o Administrative issues will become easier. 328 3.2. Determination of prefix lengths 330 With this architecture, IP address can be described as A.B.C.D where 331 the D part represents the user id. Each router in the inter-AS layer 332 will have two tables of information, one for the inter-AS-top and 333 another for the inter-AS-bottom of the inter-AS-top node that it 334 belongs to. Whereas, each node of the AS layer will have three tables 335 of routing entries; one for the inter-AS-top, one for the inter-AS- 336 bottom and another for the routing information inside the Autonomous 337 System itself. In the worst case. a node inside an AS needs to 338 maintain nMaxInterASTopNodes + nMaxInterASBottomNodes + nMaxASNodes 339 entries in its routing table. 341 The dynamic nature of allocating an area from a free pool of address 342 space is more frequent at the AS layer than at the inter-AS-bottom 343 layer. As OSPF supports all the features needed, it can be considered 344 as default choice in the AS layer. Existing implementation of OSPF 345 (Version 2) supports subnetting, by which an entire area can be 346 represented as a combination of network address and subnet mask. With 347 this approach, entire routing table gets reduced substantially. With 348 the removal of subnetting, all the nodes inside an area will have an 349 entry inside the routing table (OSPF Version 1). So the deterministic 350 factor is what is the maximum number of nodes inside an AS OSPF can 351 support once subnetting support gets removed. So the prefix length of 352 AS layer will be determined by this factor of OSPF. 354 With the introduction of hierarchy in the inter-AS layer, number of 355 entries in the BGP routing table will get reduced substantially. Even 356 if pA and pB both are selected as 16, number of routing entries come 357 within the admissible range of existing BGP protocol. But, it is the 358 responsibility of IANA to come out with a scheme how 359 nMaxInterASTopNodes and nMaxInterASBottomNodes are to be selected. 360 Each top level node will have nMaxInterASBottomNodes nodes. It will 361 be a waste of address space if each country gets assigned a top level 362 nodes (e.g. china has got a population of 1,306,313,800 people where 363 as Vatican City has got only 920 according to a census of 2006). So a 364 moderate value of nMaxInterASBottomNodes is desirable, with which 365 larger countries will have a number of top level nodes. e.g. each 366 state of USA can be assigned a top level node. With the introduction 367 of area in the inter-AS-bottom layer, each top level node can be 368 divided into number of areas of heterogeneous sizes. So, a group of 369 neighboring countries with less population can share the address 370 space of a top level node. Similarly, user-id space has to be decided 371 based on the largest area VLSM tree should be spanned through. All 372 these issues are completely geo political and have to be decided by 373 IANA. 375 3.2.1. A pseudo optimal distribution of prefixes in a 64bit architecture 377 In order to have optimal use of cable connections, length of the VLSM 378 tree is expected to be as short as possible. Also any single 379 organization may prefer to have its user id space to be under the 380 same network id. So, a 16bit user-id may become insufficient for 381 places like large university campus, where as 32bit will become too 382 large. Hence, 24bit user-id will be a moderate one which is the class 383 A address space in IPv4 (also used as the space for private IP). As 384 published in 1998 [6], OSPF can support an area with 1600 routers and 385 30K external LSAs. So, 11 bits are needed to support this space. With 386 the assumption that OSPF can support much more address space with the 387 advancement of hardware technology as well as to keep the space open 388 for future expansions, 12 bits are assigned for the AS layer. 16 bits 389 are assigned for the inter-AS-bottom layer. So, if on the average, 390 16bit equivalent space gets used within the user-id space (i.e. one 391 out of 256) and 8bit equivalent nodes gets used inside an AS (16% of 392 1600), for a top level node (with 16bit equivalent AS nodes), it will 393 generate 2^40 IP addresses, which will give 8629 IP addresses per 394 person in Japan (with a population of 127417200; Japan is at the 10th 395 position from the top in the population list of the world). So, even 396 if all the countries with population less than or equal to Japan are 397 assigned a top level node and all the provinces/states of countries 398 with larger population are assigned a top level node each, total 399 number of nodes will come well under 1024. If a number of neighboring 400 countries with lesser population shares a top level node, total 401 number of top level nodes will come down further. This suggests that 402 62 bit equivalent (10(pA)+16(pB)+12(pC)+24(user-id)) space will be 403 good enough for unicast addresses. This distribution expects OSPF to 404 support 65K (64K+1K) external LSAs. 406 64bit address space may be divided into two 63bit blocks as follows: 408 i. Global unicast addresses with the most significant bit set to 0. 409 This space is equally divided into provider assigned (PA) address 410 space with prefix 00 and provider independent (PI) address space with 411 prefix 01. Provider independent address space will be used for the 412 customers who would like to retain their number even after changing 413 their providers. As routing will be based on PA addresses, each PI 414 address will be associated to at least one PA address. Section 4 415 describes issues related to PI addressing in detail. 417 ii. Address space with the MSB set to 1 will be distributed within 418 the rest. Each of them will have a fixed prefix which will be 419 determined with the consultation with IANA. This distribution will 420 be based on the requirements and the work that have already been done 421 in connection to IPv6 along with the following requirements: 423 a) Router address space: Any node in the router address space will be 424 designated with a prefix followed by A.B.C.router-id. 426 b) Address space for multicasting: 428 c) Address space for private IP: A 32 bit address space should be 429 good enough for private IP. 431 3.2.2. Whether to go for a two-tier or three-tier hierarchy 433 Establishment of hierarchy in the inter-AS layer reduces the size of 434 BGP entries to a great extent, but leads to an improper use of 435 address space due to geo-political reason. If hierarchy in the inter- 436 AS space gets removed, entire 26bit (10+16) space will be available 437 for a single layer and use of inter-AS space will be true to its 438 sense, but will increase external LSA (and/or number of entries in 439 the BGP table) dramatically. So, it depends on to what extent OSPF 440 can support external LSAs. BGP expects the packet length to be 441 limited to 4096 bytes. BGP manages to make it work with this 442 limitation with the concept of prefix reduction in the CIDR based 443 environment. As the number of inter-AS nodes increases, BGP has to 444 change this limit in order to make it work in flat address space. The 445 alternate will be to divide the inter-AS space into number of areas 446 as defined in section 2.1. The area border routers will advertise the 447 aggregated information to the rest of the world. BGP may have to 448 incorporate both the options at the same time. As the number of 449 nodes in the inter-AS layer increases, in order to reduce the number 450 of entries in the routing table, inter-AS space has to be split into 451 two separate planes. So, two-tier hierarchy can be considered as an 452 interim state to go for three-tier hierarchy. If it so happen that 453 current available data is good enough to support the present need, it 454 will be worth to look for to what extent it can support in the 455 future. Assignment of inter-AS nodes in two-tier hierarchy should be 456 based on the geographical distribution as if it is part of three-tier 457 hierarchy. Otherwise, introduction of three-tier hierarchy in the 458 future will become another difficult task to go through. Based on the 459 report of year 2011, BGP supports ~400,000 entries in the routing 460 table. With this growing trend, BGP may have to change the limit of 461 packet length even in a CIDR based environment. With the introduction 462 of two-tier hierarchy, number of entries in the routing table will 463 come down drastically and with the three-tier approach, it will come 464 down further. 466 3.3. Issues related to Satellite communications 468 Establishment of hierarchy in the inter-AS layer expects the only way 469 any two autonomous systems in two different top level nodes 470 communicate is through their SBRs. If two autonomous systems inside 471 the same top level node communicate through satellite, it will be 472 considered as a direct link between them. Whenever autonomous system 473 'ASa' of top level node 'A' communicates with autonomous system 'ASb' 474 of top level node 'B' through satellite, they have to go through 475 their state border routers. i.e. satellite port inside 'A' that 476 communicates with a satellite port inside 'B' will be considered as 477 state border router. If multiple such ports exists inside node 'A', 478 all of them will be equidistant from any port inside 'B'. Which 479 expects any satellite port inside 'B' to have prior knowledge of list 480 of autonomous systems that will be under the purview of any port 481 inside 'A'. So, all the satellite ports of 'A' have to exchange such 482 group of information with all the satellite ports of 'B' and vice 483 versa. These group of autonomous systems can be considered as a 484 cluster of autonomous systems inside an area of a top level node. If 485 number of such ports is small, some heuristics can be applied while 486 assigning AS numbers in order to reduce the processing time during 487 the circuit establishment phase. It will become difficult to 488 maintain such heuristics once the number of such ports becomes large. 489 So, in case of satellite communication, the advantage of establishing 490 hierarchy inside inter-AS layer diminishes as the number of satellite 491 ports increases. If any private corporate maintains its own satellite 492 channel to communicate between its offices at distant locations, all 493 of these offices are going to be considered as under the user-id 494 space of its network. Service providers that provide satellite 495 services to the end-site customers, can operate in the usual manner 496 as they will provide connection to customer networks which will act 497 as stub. 499 4. Provider Independent addressing, name services and multihoming 501 Provider independent addressing can be conceived as naming a host 502 with a number. It can be used by customer networks who would like to 503 retain their number even after changing their service provider; also 504 it is useful to designate a host uniquely if the customer network is 505 multihomed. Just like in name services, as address corresponding to a 506 name needs to be resolved first to initiate communication, the same 507 is required for PI addressing. Each globally unique PI address will 508 be associated to at least one global unicast provider assigned 509 address. For a host with single interface, this number will be same 510 as the number of service providers the customer network is associated 511 with. 513 As either source or destination or both may be multihomed, there 514 could be multiple paths to communicate between two hosts. This is 515 required both for name services as well as for PI addressing. 517 A system call needs to be introduced to get the source address based 518 on the destination address. If application program needs to use the 519 destination address directly, it needs to use this system call. 521 int getcommaddr(int sockfd, struct in_addr *dst, struct addr_pair 522 *endpts); 524 'addr_pair' holds the addresses of communication end points as 525 follows: 527 struct addr_pair { 528 struct in_addr src; 529 struct in_addr dst; 531 }; 533 'getcommaddr'[8] returns the number of source-destination pairs for 534 communication; the field 'endpt' will hold the array of these 535 addresses. The array will be in sorted manner based on the best 536 possible route. 'sockfd' is used to get the 'type of service' 537 assigned. So, an application program needs to set its type of service 538 before using this call. 540 'getcommaddr needs to call a routine 'getmappedaddr' to resolve the 541 mapped provider assigned addresses of a provider independent address. 543 int getmappedaddr(struct in_addr *piaddr, struct in_addr *mpiaddr); 545 'getmappedaddr' will return number of mapped addresses and 'mpiaddr' 546 will hold their values. 548 Users may use name instead of IP address to reach the destination. A 549 new system call needs to be introduced 'gethostbynamewithsrcaddr', 550 which is an extension to 'gethostbyname' as follows: 552 struct hostent *gethostbynamewithsrcaddr(int sockfd,const char *name, 553 int *nroutes, struct addr_pair *endpts); 555 'gethostbynamewithsrcaddr'[8] takes 'name' and 'sockfd' as input 556 parameters and finds out the best possible route to reach the 557 destination. It returns the pointer to the 'hostent' structure as 558 returned by 'gethostbyname' system call. The parameter 'nroutes' 559 gets the number of possible routes to be used and the corresponding 560 source and destination addresses gets assigned to 'endpts' in sorted 561 manner. 'sockfd' is used to get the 'type of service' assigned. So, 562 an application program needs to set its type of service before using 563 this call. 565 An application program needs to use these source addresses from the 566 top (i.e. the 0th) to establish connection with the destination. It 567 needs to bind source address 'src' and then connect with the 568 destination address 'dst'. 570 4.1. PI address Resolution 572 This section tries to come up with a solution for PI address 573 resolution with the approach of DNS[7] with necessary differences. 574 Just like name space in DNS, entire address range with prefix 01 will 575 be the address space used by PI addresses. Servers that will hold the 576 information of mapping between PI addresses and corresponding PA 577 addresses will be called as PIMapServers and the programs that will 578 be used to resolve addresses will be called as PIMapResolvers. 580 In case of DNS where name is used in hierarchical format to resolve 581 the addresses, PI address resolution will be based on the prefix of 582 the PI address used for resolution. The prefix is determined based 583 on the architectural model used for the internet. Based on the 584 prefix information addresses of a list of servers can be found out 585 that will act as regional servers which will be used to resolve 586 mapped PA addresses corresponding to that PI address. A prefix will 587 serve a fixed address space within entire PI address space. Address 588 space belonging to a prefix will be distributed within customer 589 networks of heterogeneous sizes. Address space allocation and the 590 mapping of associated PA address(es) will be assigned by a regional 591 authority. The regional authority will be fully responsible for the 592 operation of regional servers in that region. 594 Like DNS, there are some root servers which will have some fixed 595 addresses, under which there are some prefixes which will act as top- 596 level-domains. In case of CIDR based hierarchy, these prefixes may be 597 of different prefix lengths which are selected based on the 598 requirements. Each prefix in a top level domain can further be split 599 into number of prefixes with the approach of CIDR. This tree 600 structured hierarchy will be kept on growing till we get prefixes 601 associated with regional servers. Each prefix associated with a 602 regional server will be distributed amongst customer networks of 603 various sizes as well as prefixes that will again be associated with 604 some regional servers with the approach of CIDR. These regional 605 servers can be considered as equivalent to the authoritative name 606 servers of DNS which are associated with zones. As stated earlier, 607 prefixes starting with "00" will be assigned for provider assigned 608 addresses and prefix starting with "01" will be assigned for provider 609 independent addresses where as prefix starting with "1" will be 610 assigned for addresses of all other types. 612 As inherent hierarchy is involved in "Mesh Structured Hierarchy", 613 this hierarchy goes up to two levels. As usual, there will be some 614 root servers with fixed assigned addresses. Each root server will 615 have prefixes with "01.A" that will act like top level domain. Under 616 each top level domain, there will be entries with prefixes "01.A.B". 617 Within a region "A.B", every global PA address is represented as 618 "00.A.B.C.user-id". In order to support customer networks of 619 heterogeneous sizes with the approach of VLSM, the "user-id" portion 620 is further divided as "subnet-id.user-id". So, the effective network 621 prefix of a customer network in PA address space is "00.A.B.C.pa- 622 subnet-id". Within an "A.B", entire PI address space with prefix 623 "01.A.B" will be distributed within customer networks of 624 heterogeneous sizes. So, effective network prefix of a customer 625 network with PI address will be "01.A.B.pi-subnet-id". A particular 626 prefix "01.A.B.pi-subnet-id" will be mapped to at least one provider 627 assigned prefix of same prefix length. For a multihomed customer 628 network within "A.B" that receives services from two service 629 providers will have prefixes "00.A.B.C1.pa-subnet-id1" and 630 "00.A.B.C2.pa-subnet-id2". A PI address prefix "01.A.B.pi-subnet-id" 631 of same length will be mapped to both these prefixes of PA address 632 space. Every region "A.B" will have regional server and backup 633 server(s) with a maximum limit (say 4) with net addresses 634 "00.A.B.server1", "00.A.B.server2", "00.A.B.server3" and 635 "00.A.B.server4". 637 Each PIMapServer will have a database of records that will have 638 information to resolve PI addresses. In memory copy of a region will 639 have an array of records where each record will have the following 640 format. 642 +------------+---------+------+-----+-------+-----------+ 643 | NetAddress | NetMask | Type | TTL | NAddr | Addr(1-4) | 644 +------------+---------+------+-----+-------+-----------+ 646 First two fields "NetAddress/NetMask" represents the PI address range 647 of a network. "Type" will be either Domain/Referral/Individual/ 648 SingleEntry/Default based on which a query and rest of the fields of 649 a record have to be processed. A PI address can have maximum four 650 mapped PA addresses. "Addr1", "Addr2", "Addr3", "Addr4" will hold the 651 corresponding PA addresses and "NAddr" will hold the number of such 652 addresses. The field "TTL" is a 32bit integer measured in seconds 653 which will hold same meaning and approach as defined in the 654 specification of DNS[7]. When a server receives a query for an 655 address "X", it extracts the record of the network based on 656 "NetAddress/NetMask" and "X" from its database. If no matching record 657 is found, a negative response is sent. Based on the "Type" of the 658 record, the query is processed in the following manner. 660 Type=Domain: 662 This is the most common type. If a customer network would not like to 663 maintain a map server opts for this option. In this case there will 664 be one to one mapping between a PI address and corresponding PA 665 addresses. The fields "Addr1"/"Addr2"/"Addr3"/"Addr4" will hold the 666 PA Net Addresses corresponding to the PI address of the network. 667 Server will send the matching record to the resolver with 668 Type=Domain. Resolver will extract the user-id portion of "X" and 669 find the corresponding mapped PA addresses based on 670 "Addr1"/"Addr2"/...etc. 672 Theoretically, "A.B" portion of a PI address need not match with the 673 "A.B" portion of the corresponding PA addresses. Consider a large 674 corporate that has its corporate office and a branch office within 675 the same region of a particular "A.B" and some other offices with 676 different values of "A.B". The corporate can maintain a contiguous 677 range of PI addresses for the ease of its operation. It needs to 678 split entire PI address range based on its offices and assign the 679 corresponding PA addresses. In order to minimize the path of a query 680 it is desirable that "A.B" of a PI address and its corresponding 681 mapped PA addresses belong to the same region. 683 Type=Referral: 685 This is used when an address within the domain "NetAddress"/"NetMask" 686 has to be processed by another map server. The map server may itself 687 be another regional server or a server within a customer network. 689 When a customer network would like to have a direct control for the 690 mapping of its addresses it needs to opt for this option. 691 "Addr1"/"Addr2"/"Addr3"/"Addr4" of the database entry will hold the 692 pointer to the information associated to each map server. "NAddr" 693 will hold the number of map servers that can be referred. Information 694 of each server will hold the following values: PI address of the map 695 server + Number of PA addresses to reach the map server + PA 696 addresses of the map server. Any one of these map servers need to be 697 queried for further processing. A server may act either in recursive 698 mode or in iterative mode based on its implementation just like in 699 DNS. A large corporate may have different offices and each (or some 700 of them) may maintain a map server based on their policies. 702 When a server needs to handle a particular address separately, it 703 needs to set "NetAddress" with that particular address and all the 704 bits of "NetMask" will be set to "1". The "Type" field has to be set 705 as "SingleEntry"(which is similar to the Type Address(A) in terms of 706 DNS). If some of its addresses need to be handled separately but for 707 the rest common rule may apply (like Type=Domain), records of the 708 individual entries should be processed first and then for the rest. 709 In these cases "Type" has to be set as "Default". So, a server of a 710 customer network may have database entries with Type=Domain/Referral 711 /SingleEntry/Default. It makes sense for a server (or a master file) 712 to have entries with Type=Default, but from the point of a resolver, 713 it does not make any sense. So a server needs to extract the PA 714 addresses and form a record with Type=SingleEntry and send it back to 715 the resolver. 717 For a host having multiple interfaces, each interface may be assigned 718 PA addresses supplied by all the service providers, but it is 719 desirable that PI address gets mapped to only one of them (preferably 720 for a CE router, the interface which will have the shortest path will 721 be mapped PI address with the PA address associated with that CE 722 router). 724 Type=Individual: 726 This is meant for the individual users opting for services like 727 telephonic services that need to maintain PI address. With this 728 option a mobile user may maintain its PI address after changing its 729 service provider. A map server needs to maintain some networks with a 730 range of PI addresses in its database. When a query for an address 731 "X" is received, server needs to get the corresponding record where 732 "Addr1" will hold the pointer to a open file descriptor (or pointer 733 to the in memory copy) of a separate data file where there will be 734 one to one mapping between PI address and its corresponding PA 735 address of all the assigned PI addresses. These networks and 736 assignment of individual PI addresses have to be done by the regional 737 authority. 739 As with Type=Default, Type=Individual does not make any sense to a 740 resolver. So, server needs to extract PA address and form a record 741 with Type=SingleEntry and send it back to the resolver. 743 As stated above, this solution is based on the approach of DNS. For 744 the ease of implementation and to make use of the existing source 745 code related to DNS (e.g. BIND) most of the features have been taken 746 from DNS. Where ever differences arise, the approach followed by this 747 document has to be accepted. 749 IANA has to assign a port (e.g. 53 in case of DNS) for its UDP/TCP 750 based implementation. 752 4.1.1. Record Format 754 Each record (the way they will appear in a master file or will be 755 used for communication) will have the following format: 757 NetAddress/NetMask + Type (8 bit unsigned int) + + RDATA (Type 758 specific information) 760 Record types are primarily the types of records as described above 761 along with three other types: SOA (Start of a zone of authority), MPS 762 (host with Type=SingleEntry that acts as a Map server for this zone) 763 and DFL (Data File). These types are mainly useful in the context of 764 processing AXFR/IXFR/NOTIFY/DFAXFR/DFIXFR messages. 766 Types are defined as follows: 768 Types values comments 769 ----------------------------------------------------------- 770 SEN (SingleEntry) 1 same as type A(address) in DNS 771 MPS (MapServer) 2 Map server 772 DMN (Domain) 3 773 DEF (Default) 4 774 REF (Referral) 5 775 SOA (Start of a zone) 6 776 IND (Individual) 7 777 DFL (Data File) 8 778 ----------------------------------------------------------- 780 RDATA of different types will appear as follows: 782 Type=SOA: 783 PI address of server+SERIAL+REFRESH+RETRY+EXPIRE+MINIMUM (meaning and 784 values of SERIAL/REFRESH/RETRY/EXPIRE/MINIMUM are same as they were 785 defined in section 3.3.13 of RFC 1035[11]) 787 Type=(SEN/MPS): 788 NAddr(Number of addresses) + corresponding PA addresses 790 Type=(DMN/DEF): 791 NAddr(Number of addresses) + corresponding Net addresses 793 Type=REF: 794 NAddr(Number of map server) + for each map server (PI address of map 795 server + NAddr(Number of addresses of map server) + corresponding PA 796 addresses)) 798 Type=IND: 799 NAddr(=1) + full path name of the data file 801 Type=DFL: 802 Data file name + SERIAL + Number of records in the data file(32 bit 803 unsigned int) 805 While used in communication data file name is used as its length (8 806 bit unsigned int) followed by the octets of the string. 808 TTL value of a record has to be set to 0 if it is not relevant or to 809 accept the value associated with the record of SOA. 811 4.1.2. Messages 813 In order to support most of the features of DNS, message format has 814 been retained almost same as that of DNS. So, all the relevant fields 815 will be processed exactly in the same manner as that have been done 816 in DNS and all the irrelevant issues have to be ignored. Rest of this 817 section describes where and how changes have to be made. 819 As defined in RFC 1035, the top level format of message is divided 820 into 5 sections (some of which are empty in certain cases) shown 821 below: 823 +---------------------+ 824 | Header | 825 +---------------------+ 826 | Question | the question for the name server 827 +---------------------+ 828 | Answer | answering part of the question 829 +---------------------+ 830 | Authority | authoritative map server 831 +---------------------+ 832 | Additional | additional information 833 +---------------------+ 835 The header section has been retained as defined in RFC 5395[12] as 836 follows: 838 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 839 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 840 | ID | 841 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 842 |QR| OpCode |AA|TC|RD|RA| Z|AD|CD| RCODE | 843 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 844 | QDCOUNT/ZOCOUNT | 845 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 846 | ANCOUNT/PRCOUNT | 847 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 848 | NSCOUNT/UPCOUNT | 849 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 850 | ARCOUNT | 851 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 853 The question section will have two parts: 854 QType(one octet unsigned int)+QData. 856 Query types are defined as follows: 858 QTypes values comments 859 ----------------------------------------------------------- 860 SEN 1 query for mapped PA address 861 SOA 6 query information related to SOA 862 DFL 8 query information related to data file 863 DFXFR 249 data file transfer 864 DFIXFR 250 incremental data file transfer 865 IXFR 251 incremental authoritative data file xfr 866 AXFR 252 authoritative data file transfer 867 ----------------------------------------------------------- 868 QData will hold values based on QType. 870 Following section describes issues related to QType=SEN. Issues 871 related to all other QTypes (i.e. related to file transfer) will be 872 discussed afterwords. 874 For QType=SEN(1): QData=PI address that needs to be resolved. 876 The answer section, authority section and additional section will 877 have a number of resource records where the number will be specified 878 in the header. 880 On receiving a query, map server will return the matching record from 881 its database. If response is address, the answer section will hold 882 the record of any one of these two types: SEN/DMN. 884 If Type=DMN, resolver needs to extract the mapped addresses as 885 described in section 4.1. 887 If Type=DMN, entire address range will appear in the form of 888 NetAddress/NetMask. This will have advantages while catching data for 889 any particular address, but getting the information of the entire 890 address range. 892 If the response is referral, answer section will be empty and the 893 authoritative section will hold the record with Type=REF. 895 If server supports recursion, for each iterative process that it 896 receives a record with Type=REF, it needs to push the record to the 897 additional section of the message that needs to be sent to the 898 resolver. So, additional section will hold the records of Type=REF of 899 the chain of the tree through which PA addresses have been resolved. 901 4.1.3. Master file and data file 903 Section 5 of RFC 1035 states: 905 "Master files are text files that contain RRs in text form. Since 906 the contents of a zone can be expressed in the form of a list of RRs 907 a master file is most often used to define a zone, though it can be 908 used to list a cache's contents." 910 Section 5.1 of RFC 1035 states: 912 "The format of these files is a sequence of entries. Entries are 913 predominantly line-oriented, though parentheses can be used to 914 continue a list of items across a line boundary, and text literals 915 can contain CRLF within the text. Any combination of tabs and spaces 916 act as a delimiter between the separate items that make up an entry. 917 The end of any line in the master file can end with a comment. The 918 comment starts with a ";" (semicolon)." 920 Master files follow the same approach and format in the line of DNS 921 as described in section 5 of RFC 1035 with necessary differences. 923 An example master file may look like as follows: 925 @ "PI NetAddr"/"Net Mask" SOA "PI address of primary server" ( 926 20 ; SERIAL 927 7200 ; REFRESH 928 600 ; RETRY 929 3600000; EXPIRE 930 60) ; MINIMUM 931 "PI NetAddr"/"Net Mask" MPS 0 NAddr "PA addresses" 932 "PI NetAddr"/"Net Mask" SEN 0 NAddr "PA addresses" 933 "PI NetAddr"/"Net Mask" DMN 0 NAddr "Net addresses" 934 "PI NetAddr"/"Net Mask" DEF 0 NAddr "Net addresses" 935 "PI NetAddr"/"Net Mask" IND 0 NAddr(=1) "Data file name" 937 A data file contains a sequence of entries where each entry appears 938 in a separate line. Each entry is a mapping between a PI address and 939 its associated PA address separated by space(s). Entries are 940 generally sorted with PI address. As in case of master file comments 941 can be inserted with the start of a ";" (semicolon) that will end at 942 the end of the line. Data files are commonly associated with the map 943 servers maintained by regional authority, but they are not generally 944 associated with the map servers maintained by individual customer 945 networks. A data file entry may appear to be as follows: 947 "PI Address" NAddr "PA Addresses" 949 A map server may have a number of data files. These files have to be 950 defined in another file (a supporting file, the way boot file 951 "named.boot" is used in BIND) that will have information of each of 952 them. An entry in that file will follow the same format of a record 953 (Type=DFL) and will have the following fields: 955 "PI NetAddr"/"NetMask" Type(DFL) TTL "Data File Name" SERIAL "Number 956 of records". 958 This file will be used to process message with QType=DFL which will 959 be used to support data file transfer/incremental data file transfer. 961 For QType=DFL(8): QData="PI NetAddr"/"NetMask" of the desired network 962 For QType=SOA(6): QData="PI NetAddr"/"NetMask" of the desired zone 963 A map server will return a record of Type=DFL on receiving a query 964 with QType=DFL where as it will return a record of Type=SOA on 965 receiving a query with QType=SOA. 967 4.1.4. Zone maintenance and transfers 969 Section 4.3.5 of RFC 1034 states: 971 "The general model of automatic zone transfer or refreshing is that 972 one of the name servers is the master or primary for the zone. 973 Changes are coordinated at the primary, typically by editing a master 974 file for the zone. After editing, the administrator signals the 975 master server to load the new zone. The other non-master or 976 secondary servers for the zone periodically check for changes (at a 977 selectable interval) and obtain new zone copies when changes have 978 been made. 980 To detect changes, secondaries just check the SERIAL field of the SOA 981 for the zone. In addition to whatever other changes are made, the 982 SERIAL field in the SOA of the zone is always advanced whenever any 983 change is made to the zone." 985 Section 1.2 of RFC 5936 states: 987 "A DNS implementation is not required to support AXFR, IXFR, and 988 NOTIFY, but it should have some means for maintaining name server 989 coherency. A general-purpose DNS implementation will likely support 990 AXFR (and in the same vein IXFR and NOTIFY), but turnkey DNS 991 implementations may exist without AXFR." 993 Zone maintenance and transfer will follow the same approach as DNS 994 with few minor updates. Frequency of update of data files will be 995 high compared to the frequency of update of master file. That is why 996 transfer(/incremental transfer) of data file has been treated 997 separately from the transfer(/incremental transfer) of master file. 999 For all the messages of QType=AXFR/DFXFR/IXFR/DFIXFR, QData="PI 1000 NetAddr"/"NetMask" of the desired zone or the desired network. NOTIFY 1001 message needs to include which file has been updated followed by the 1002 related information. So, if master file has been changed, NOTIFY 1003 message with query type SOA will be sent and query type DFL will be 1004 sent if a data file has been changed. 1006 Transfer of master file will be same as transfer of master file in 1007 DNS followed by transfer of all the data files. i.e. processing of 1008 AXFR will have the same approach as DNS followed by DFXFR for all the 1009 data files. In order to make this happen, at the end of transferring 1010 the contents of the master file, server (of AXFR message) needs to 1011 send NOTIFY message for all of the data files belonging to that zone 1012 to the client(i.e. the secondary server). Processing of NOTIFY of a 1013 data file by the secondary server needs to send DFIXFR to the primary 1014 if data file already exist; otherwise it needs to send DFXFR. 1015 Incremental update of master file (IXFR) will be same as IXFR in DNS 1016 with a minor update. If client of IXFR finds a new data file gets 1017 introduced, it calls DFXFR corresponding to that data file. Similarly 1018 if an entry of a data file gets deleted, client deletes corresponding 1019 data file. 1021 Processing of DFXFR will have same approach of AXFR in DNS. 1022 Similarly processing of DFIXFR will have same approach as IXFR in 1023 DNS. While transferring a data file record, an equivalent record of 1024 type SEN needs to be sent with the values of PI address and mapped PA 1025 address(es) from the record of data file. Where ever a record of type 1026 SOA is sent while processing AXFR/IXFR in case of DNS, record of type 1027 DFL needs to be sent while processing DFXFR/DFIXFR. 1029 For AXFR, IXFR and NOTIFY in DNS, one needs to follow RFC 5936[13], 1030 RFC 1995[14] and RFC 1996[15] respectively. 1032 5. Issues related to IP mobility 1034 An interface of a customer network may have several IP addresses 1035 (e.g. for a multihomed customer site, each interface will have 1036 multiple global unicast addresses also it may have private 1037 addresses). For a mobile node that has been moved to a customer 1038 network which gets service from a service provider and maintains 1039 private IP addresses, will have at least three IP addresses; provider 1040 assigned unicast address, private address and its permanent "Home 1041 Address". The "Home Address" will be aliased with the provider 1042 assigned address (i.e. the co-located care-of address). So the 1043 interface structure needs to have an additional field to hold the 1044 value of care-of address. The PCB structure will have an additional 1045 field 'inp_lcladdr'. So 'inp_lcladdr' will have the current provider 1046 assigned address that a foreign node needs to use for communication. 1047 The field 'inp_laddr' that is used to hold the value of local address 1048 will hold the value of "Home Address" of a mobile node. Similarly, 1049 PCB needs to introduce another field 'inp_fcladdr' to support the 1050 destination address to be mobile. The existing field 'inp_faddr' 1051 which is used to address a foreign address will hold the value of 1052 "Home Address" of the mobile node. Customers with PI address who 1053 would like to have mobility support, the mapped address will be 1054 considered as the "Home Address" of the mobile node. 1056 An outgoing packet from a mobile node in a foreign site needs to be 1057 stacked with the associated care-of address. While initiating 1058 communication, the 'bind' system call needs to go through the 1059 interface list and fetch the associated structure to check whether 1060 the source address is aliased or not and needs to fill the value of 1061 'inp_lcladdr' of PCB accordingly. 1063 When TCP receives a SYN for connection establishment, it allocates a 1064 PCB and assigns the values for 'inp_laddr', and related fields. 1065 During this phase, TCP also needs to check whether the local address 1066 is aliased or not (based on the fields of interface structure; which 1067 is applicable for a mobile node at foreign site) and needs to fill 1068 the values of 'inp_lcladdr' accordingly. Similarly if destination 1069 address is found to be aliased, based on the stacking type, it needs 1070 to fill up the field 'inp_fcladdr'. 1072 IP address stacking can be performed with the approach introduced in 1073 section 6.4 of RFC6275[9]. RFC6275 talks about the stacking of IP 1074 addresses for a destination address (Let us call it as type 0 1075 stacking). Two more types of stacking need to be introduced; type 1 1076 stacking where only source address will appear in the stack and type 1077 2 stacking where both source address and destination address will 1078 appear in the stack with a particular type of ordering. 1080 Protocol output routine like 'tcp_output' or 'udp_output' needs to 1081 fill the IP packet in the following manner. 1083 If the socket contains a valid 'inp_lcladdr', use 'inp_lcladdr' as 1084 the source address and 'inp_laddr' will appear in the stack. If the 1085 socket contains a valid 'inp_fcladdr' use 'inp_fcladdr' as the 1086 destination address and 'inp_faddr' will appear in the stack. If only 1087 'inp_fcladdr' contains a valid address where as 'inp_lcladdr' is 1088 NULL, use type 0 stacking. If only 'inp_lcladdr' contains a valid 1089 address where as 'inp_fcladdr' is set as NULL, use type 1 stacking. 1090 If both 'inp_lcladdr' and 'inp_fcladdr' contains valid addresses, use 1091 type 2 stacking. 1093 Protocol input routine like 'tcp_input' or 'udp_input' needs to 1094 process the packet in the reverse order based on the type of 1095 stacking. For type 0 stacking, use the address in the stack as the 1096 destination address; for type 1 stacking, use the address in the 1097 stack as the source address; for type 2 stacking use both source 1098 address and destination address from the stack. 1100 5.1. Changes expected with the specifications related to IP mobility 1102 RFC6275 demands correspondent node binding from mobile nodes for 1103 route optimization. This binding is required when a connection gets 1104 established as well as when the mobile node changes it address space. 1105 There are application like HTTP which opens up multiple connections 1106 on the run time which are very short lived. If mobile nodes need to 1107 send binding messages for all the connections, network will be 1108 unnecessarily congested. This congestion can be avoided with the 1109 establishment of binding at the time of connection establishment 1110 itself. So, if TCP server happens to be mobile, it will set the 1111 value of 'inp_lcladdr' in the stack while sending SYN+ACK. TCP client 1112 which initiates communication through 'connect' needs to set 1113 'inp_fcladdr' field on receiving TCP+ACK. With this approach 1114 correspondent node binding messages need to be sent only when a 1115 mobile node changes its position from one address space to another. 1117 Route optimization is not applicable to applications which are of 1118 multicast type. In these cases packets need to be forwarded with the 1119 mechanism of reverse tunneling with the approach of "IP Encapsulation 1120 within IP" as defined in RFC2003. In order to support packet 1121 delivery with route optimization method as well as with 1122 "Encapsulating Delivery Style" based on the application type the 1123 protocol control block needs to introduce another field 1124 'inp_hagentaddr' to hold the address of the home agent of the mobile 1125 node. The interface structure also needs to have same field. The 1126 'bind' system call needs to go through the interface list to fetch 1127 'inp_hagentaddr' to the PCB along with 'inp_lcladdr' as described 1128 earlier. So, protocol output routines like 'tcp_output', 'udp_output' 1129 need to fill up the packets based on the application type. In 1130 "Encapsulating Delivery Style" packets need to be formed in the 1131 following manner. 1133 The inner IP header will contain 1134 Source Address: Home address of the mobile node 1135 (i.e. 'inp_laddr') 1136 Destination address: Address of the correspondent node 1137 (i.e. 'inp_faddr') 1138 The outer IP header will contain 1139 Source Address: co-located care of address of the mobile node 1140 (i.e. 'inp_lcladdr') 1141 Destination Address: Address of the home agent of the mobile node 1142 (i.e. 'inp_hagentaddr') 1143 Protocol field: IP in IP 1145 6. Refinements over existing IPv6 specification 1147 As IPv6 was envisioned long before some of the newer technologies 1148 e.g. MPLS came into picture, some refinements can be made over the 1149 existing specification. These considerations are related to bandwidth 1150 usages and performance inside switches. Experimental results show 1151 that smaller packet size gives better result for the processing of RT 1152 packets. So, it is desirable to have IP packet header to be as small 1153 as possible. 1155 As described earlier, evaluation of the parameters 1156 nMaxInterASTopNodes, nMaxInterASBottomNodes and nMaxASNodes is geo- 1157 political and have to be decided by IANA. Once these parameters are 1158 determined with mutual agreements, values of pA, pB, pC and prefix 1159 length of user id can be determined. With 64bit address space, IP 1160 header will be reduced by 16 bytes. 1162 The 'flow label' field of IPv6 packet header may not be of any use 1163 with MPLS is in use. ATM used to have 4 priority classes. The first 1164 specification of IPv6 RFC-1883 used a 4bit type of service field 1165 along with a 24bits flow label field. These two were modified to a 1166 8bit type of service field and a 20bit flow label field in the 1167 current spec RFC-2460. Too many priority classes may increase 1168 complexities to process inside switches. If type of service field of 1169 IPv6 header may be reduced to be of 4bit length as it was stated in 1170 RFC-1883 and 'flow label' field gets removed, another three bytes may 1171 be reduced from the IPv6 header. 1173 The field 'Hop Limit' has got a 8bit value in the existing spec. The 1174 role of this field needs to be discussed properly with a large 1175 address space. 1177 RFC4862[10] introduces the concept of "Stateless auto configuration" 1178 with the goal in mind that no manual configuration is required by 1179 individual machines before connecting them to the network. It 1180 generates a link local address with a link-local prefix and the link 1181 address (e.g. Ethernet/E.164 for ISDN) first. This link local address 1182 is used to configure global unicast address and any other 1183 configurable parameters based on router advertisement. Global 1184 unicast addresses are generated by the prefix supplied by the router 1185 advertisement and the link specific interface identifier. This 1186 identifier can be as large as 64 bit length. So irrespective of the 1187 size of the network (it may be 10000 or 100 or even less than that) 1188 every customer network will consume a 64bit equivalent addresses. 1189 This seems to be a huge blunder. What is expected is the length of 1190 the interface identifier is equivalent to support the number of nodes 1191 supported by that subnet. In order to achieve this the router itself 1192 or a server in that subnet needs to maintain a storage which will 1193 generate the interface identifier based on the request from 1194 individual hosts. It may be desirable that interface identifiers are 1195 generated from DHCP servers. With the option of generating interface 1196 identifier through DHCP, changes in the auto configuration process 1197 can be looked at as follows: 1199 From the point of view of a host, it can be considered as a two step 1200 process. Host needs to send Router Solicitations message to find out 1201 the presence of a router. Router Advertisement message should include 1202 an option field which will inform whether prefix information should 1203 be configured through Router Advertisement or through DHCP. Host 1204 needs to send a request message to get the interface identifier. If 1205 both the information needs to be obtained from a DHCP server they can 1206 be obtained through a single message. 1208 From the server's point of view, it needs to maintain a database for 1209 a mapping of the link-layer address and subnet specific interface 1210 identifier. Lifetime of an interface identifier has to be processed 1211 in the usual manner the way existing DHCP implementation treats IP 1212 addresses. 1214 There seem to be another possible danger to obtain prefix information 1215 through Router Advertisement. As the Router Advertisement comes in 1216 the form of ICMP messages, once it is received by the ICMP layer, it 1217 looses information from which interface the message has been received 1218 (This problem arises for hosts that are having multiple interfaces 1219 and not all of them are attached to the same subnet). So, auto 1220 configuration of a host has to be performed one interface at a time 1221 by making all other interfaces disabled. Once configuration of all 1222 the interfaces are done, all of them have to be enabled. 1224 If it is expected that hosts should reconfigure their addresses 1225 dynamically based on Router Advertisement message, Router 1226 Advertisement needs to generate a special message for a certain 1227 amount of time that needs to include old prefix and the corresponding 1228 new prefix in the message. 1230 In order to support multihoming[8], prefix information needs to 1231 include the fields 'default router' and 'next hop address' to reach 1232 the default router for each of the prefixes. 1234 In a 64bit architecture, link-local address can be formed with a 1235 link-local prefix and link-layer address in a suitable manner; say it 1236 can be formed with a 16bit link-local prefix followed by a 48bit 1237 link-layer address. For hardware that supports more than 48bit 1238 addressing (say E.164), the least significant 48bits may be 1239 considered to generate link-local addresses. 1241 7. Distributed processing and Multicasting 1243 With the inherent hierarchy involved in this architecture, 1244 distributed applications can also be structured in a suitable manner. 1245 Say, for a commonly used web based application a master level server 1246 will be there at every top level node. Any change that might happen 1247 in the application, has to be synchronized within these master level 1248 servers first. There might be servers at the middle layer (inside 1249 each inter-AS-bottom) inside each top level node. Once the changes 1250 get reflected at the master node, all the servers at the middle layer 1251 needs to update themselves with their master level node. This will 1252 reduce network traffic substantially. Inherent hierarchy in the 1253 architecture will also help establishing multicast tree in the 1254 similar manner. Work on these issues can be progressed only after 1255 this architecture gets approved. 1257 8. Transition to real IP from private IP 1259 Both CIDR based hierarchy and Mesh structured hierarchy expects a 1260 VLSM tree at the bottom. In VLSM, in real IP space with provider 1261 assigned (PA) addresses, assignment of network resources has to be 1262 associated with the address space to be used with the type of 1263 service. Within a typical switch supporting multiple types of ports, 1264 a line card of strength OC48 can be replaced with 4 line cards of 1265 strength OC12. An OC12 card may also be replaced with 4 OC3 cards. An 1266 OC12 card may be attached to another switch with DS3 ports and so on. 1267 When it reaches to the customer network port density of a switch has 1268 to be directly proportional to the address block that a customer 1269 network will be assigned to. i.e. each customer network has to be 1270 assigned a block of address space (say, 128, 256, 512, 1K, 2K etc). 1271 Within the switch these ports have to be assigned net address/net 1272 mask the way VLSM works. 1274 In IPv4 environment, providers have provided services in terms of 1275 bandwidth of the ports say, 2 Mbps/4 Mbps/1 Gbps line etc. If these 1276 ports were assigned addresses based on the number of users of the 1277 customer network, transition from private IP to real IP is simple. 1278 Consider a switch that has supplied 2 Mbps line to a set of customers 1279 with number of users within 1K to 2k, each of them will be assigned a 1280 block of 2K each. But if number of users are not proportional to the 1281 bandwidth used, say same 2 Mbps line were used to customers of sizes 1282 1K, 2K 10K and 16K respectively reorganization will be needed if 1283 possible. This rearrangement may be possible within the switch itself 1284 or by connecting ports of appropriate sizes from different switch, 1285 otherwise each of them has to be assigned an address block of 16K 1286 each or with the way VLSM works whatever is suitable. So, address 1287 block assignment in the VLSM tree has to grow in a bottom up 1288 approach. 1290 Thus, transition of existing provider network without (or very 1291 little) rearrangement to a real IP space with CIDR based approach is 1292 apparently not a difficult job. In a CIDR based approach, sizes of 1293 the VLSM trees are heterogeneous that leads to number of routing 1294 entries to be very high. Mesh structured hierarchy is convenient to 1295 reduce the routing overhead as well as for distribution of network 1296 resources in a suitable manner in the long run. To covert CIDR based 1297 approach to Mesh structured hierarchy requires reorganization mainly 1298 in the routing domain and by splitting trees of very large sizes (>24 1299 bit address space) at the top. 1301 Section 3.2.1 reveals that in Mesh structured hierarchy a 64bit 1302 architecture will be good enough for our need in a provider assigned 1303 (PA) address space; the same is true for CIDR based approach as well. 1305 9. IANA Consideration 1307 This is a first level draft for proposed standard. Hence, IANA 1308 actions should come into play at a later stage, if needed. 1310 10. Security Consideration 1312 This document does not include any security related issues. 1314 11. Acknowledgments 1316 The author would like to thank to Professor Amitava Datta of 1317 University of Western Australia for his review and constructive 1318 comments. 1320 12. Normative References 1322 [1] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for 1323 IPv6 Hosts and Routers", RFC 4213, October 2005. 1325 [2] Fuller V., Li. T., "Classless Inter-Domain Routing (CIDR): The 1326 Internet Address Assignment and Aggregation Plan", RFC 4632, 1327 August 2006. 1329 [3] Huston, G., "Commentary on Inter-Domain Routing in the 1330 Internet", RFC 3221, December 2001. 1332 [4] Q. Vohra, E. Chen., "BGP Support for Four-octet AS Number 1333 Space", RFC 4893, May 2007. 1335 [5] Srisuresh, P. and K. Egevang, "Traditional IP Network Address 1336 Translator (Traditional NAT)", RFC 3022, January 2001. 1338 [6] J. Moy., "OSPF Standardization Report", RFC 2329, April 1998 1340 [7] P.V. Mockapetris., "Domain names - concepts and facilities", 1341 RFC 1034, November 1987. 1343 [8] S. Bandyopadhyay, "Solution for Site Multihoming in a Real IP 1344 Environment", work in progress. 1346 [9] C. Perkins, Ed., D. Johnson, J. Arkko, "Mobility Support in 1347 IPv6" RFC 6275, July 2011. 1349 [10] S. Thomson, T. Narten, T. Jinmei, "IPv6 Stateless Address 1350 Autoconfiguration", RFC 4862, September 2007. 1352 [11] P.V. Mockapetris, "Domain names - implementation and 1353 specification", RFC 1035, November 1987. 1355 [12] D. Eastlake 3rd, "Domain Name System (DNS) IANA 1356 Considerations", RFC 5395, November 2008. 1358 [13] E. Lewis, A. Hoenes, Ed., "DNS Zone Transfer Protocol (AXFR)", 1359 RFC 5936, June 2010. 1361 [14] M. Ohta, "Incremental Zone Transfer in DNS", RFC 1995, 1362 August 1996. 1364 [15] P. Vixie, "A Mechanism for Prompt Notification of Zone Changes 1365 (DNS NOTIFY)", RFC 1996, August 1996. 1367 13. Informative References 1369 [16] Postel, J., "Internet Protocol", STD 5, RFC 791, 1370 September 1981. 1372 [17] Rekhter, Y., and T., Li, "A Border Gateway Protocol 4 (BGP- 1373 4)",RFC 1771, March 1995. 1375 [18] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) 1376 Specification, RFC 1883, December 1995. 1378 [19] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. 1380 [20] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) 1381 Specification", RFC 2460, December 1998. 1383 [21] Rosen, E., Viswanathan, A. and R. Callon, "Multiprotocol 1384 Label Switching Architecture", RFC 3031, January 2001. 1386 14. Author's Address 1388 Shyamaprasad Bandyopadhyay 1389 HL No 205/157/7, Kharagpur 721305, India 1390 Phone: +91 3222 225137 1391 e-mail: shyamb66@gmail.com