idnits 2.17.1 draft-shyam-real-ip-framework-53.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 11 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 860 has weird spacing: '...lent to the a...' -- The document date (February 10, 2019) is 1902 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: 'RFC6177' on line 1505 -- Looks like a reference, but probably isn't: 'RFC4692' on line 1510 == Unused Reference: '19' is defined on line 1698, but no explicit reference was found in the text == Unused Reference: '20' is defined on line 1701, but no explicit reference was found in the text == Unused Reference: '21' is defined on line 1704, but no explicit reference was found in the text == Unused Reference: '22' is defined on line 1707, but no explicit reference was found in the text == Unused Reference: '23' is defined on line 1709, but no explicit reference was found in the text == Unused Reference: '24' is defined on line 1712, but no explicit reference was found in the text ** Obsolete normative reference: RFC 4893 (ref. '4') (Obsoleted by RFC 6793) ** Obsolete normative reference: RFC 5395 (ref. '12') (Obsoleted by RFC 6195) -- Obsolete informational reference (is this intentional?): RFC 1771 (ref. '20') (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 1883 (ref. '21') (Obsoleted by RFC 2460) -- Obsolete informational reference (is this intentional?): RFC 2460 (ref. '23') (Obsoleted by RFC 8200) Summary: 2 errors (**), 0 flaws (~~), 9 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET DRAFT S. Bandyopadhyay 3 draft-shyam-real-ip-framework-53.txt February 10, 2019 4 Intended status: Experimental 5 Expires: August 10, 2019 7 An Architectural Framework of the Internet for the Real IP World 8 draft-shyam-real-ip-framework-53.txt 10 Abstract 12 This document tries to propose an architectural framework of the 13 internet in the real IP world. It describes how a three-tier mesh 14 structured hierarchy can be established in a large address space 15 based on fragmenting it into some regions and some sub regions inside 16 each of them. It shows how to make a transition from private IP to 17 real IP without making significant changes with the existing network. 18 With the useful works done through IPv6, it provides all necessary 19 inputs based on which a specification of IP with 64 bit address space 20 may be emerged. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on August 10, 2019. 39 Copyright Notice 41 Copyright (c) 2019 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. 51 Table of Contents 52 1. Introduction.....................................................2 53 2. Background.......................................................3 54 3. A Three tier mesh structured hierarchical network................4 55 3.1. Route propagation...........................................5 56 3.2. Determination of prefix lengths.............................8 57 3.2.1. A pseudo optimal distribution of prefixes in 58 a 64 bit architecture................................9 59 3.2.2. Whether to go for a two tier or three tier hierarchy 60 ....................................................11 61 3.3. Issues related to Satellite communications.................11 62 3.1.1. Setting default route inside VLSM tree..............12 63 3.1.2. IP VPN with MPLS inside VLSM tree...................14 64 3.1.2.1. Extension to RSVP-TE to support IP 65 VPN inside VLSM tree.......................14 66 4. Provider Independent addressing, name services and multihoming..16 67 4.1. PI address Resolution......................................18 68 4.1.1. Record Format.......................................21 69 4.1.2. Messages............................................23 70 4.1.3. Master file and data file...........................25 71 4.1.4. Zone maintenance and transfers......................26 72 5. Issues related to IP mobility...................................27 73 5.1. Changes expected with the specifications related 74 to IP mobility.............................................29 75 6. Refinements over existing IPv6 specification....................30 76 7. Distributed processing and Multicasting.........................33 77 8. Transition to real IP from private IP...........................33 78 9. IANA Consideration..............................................34 79 10. Security Consideration.........................................34 80 11. Acknowledgments................................................34 81 12. Normative References...........................................35 82 13. Informative References.........................................36 83 14. Author's Address...............................................36 85 1. Introduction 87 Transition from IPv4 to IPv6 is in the process. Work has been done to 88 upgrade individual nodes (workstations) from IPv4 to IPv6. Also, 89 there are established documents to make routers/switches to work to 90 support IPv4 as well as IPv6 packets simultaneously in order to make 91 the transition possible [1]. CIDR[2] based hierarchical architecture 92 in the existing 32-bit system is supposed to be continued in IPv6 too 93 with a large address space. There are documents/concerns over BGP 94 table entries to become too large in the existing system [3]. There 95 are proposals to upgrade Autonomous System number to 32-bit from 96 16-bit to support the demand at the same time [4]. The challenge 97 relies on how to make the transition smooth from IPv4 to a real IP 98 world with least changes possible. 100 The term "real IP environment" is referred to an environment where 101 hosts in a customer network will possess globally unique IP addresses 102 and communicate with the rest of the world without the help of 103 NAT[5]. This document reflects changes required with the BSD 4.4 104 source code where ever applicable. 106 2. Background 108 Existing system is in work with Autonomous System (AS) and inter-AS 109 layer with the approach of CIDR. In order to meet the need within the 110 32-bit address space, Autonomous Systems of various sizes maintain 111 CIDR based hierarchical architecture. With the help of NAT [5], a 112 stub network can maintain an user ID space as large as a class A 113 network and can meet its useful need to communicate with the rest of 114 the world with very few real IP addresses. With the combination of 115 CIDR and NAT applied in the entire space, most of the part of 32-bit 116 address space gets effectively used as network ID. 118 With traditional CIDR based hierarchy, a node of higher prefix can be 119 divided into number of nodes with lower prefixes. Each divided node 120 can further be subdivided with nodes of further lower prefixes. This 121 process can be continued till no further division is possible. The 122 point worth noting is at each point the designer of the network has 123 to preconceive the future expansion of the network with the concept 124 in the mind that the resource can not be exhausted at any point of 125 time. This phenomenon leads the designer to allocate resources much 126 higher than whatever is needed which leads to a space of unused 127 address space. The problem gets aggravated once resource gets 128 exhausted by any chance. e.g. a node of prefix /16 can be divided 129 with a number of nodes of prefixes /24. If any one of the nodes /24 130 gets exhausted, resources of other nodes of prefixes /24 can not be 131 used even if they are available. 133 In IPv4 environment, there is a desperate attempt of the service 134 providers to provide internet services with the help of NAT. e.g. a 135 large educational institute meets its current requirement with 4 real 136 IP addresses; one for its mail server, one for its web server, one 137 for its ftp server and another one for its proxy server to provide 138 web based services to all of its users. In general, these services 139 are used by an organization of any size(it may be 400 or even 40000). 140 In the current scenario, the CIDR based tree has been built using 141 these components together. When private IP will be replaced with real 142 IP, each customer network will require IP addresses based on its size 143 and requirement. 145 Transitioning from private IP to real IP basically requires the 146 following components: 148 o A solution for site multihoming with provider assigned 149 address space 150 o A strategy to replace private IP to real IP 151 o A solution to uniquely identify a host in a real IP environment 152 o A solution to make individual nodes and routers/switches to work 153 with IPv4 and next generation IP simultaneously. 155 Solution for site multihoming has been provided in a separate 156 document [8]. Section 8 shows how to make a transition from private 157 IP space to real IP space with provider assigned addresses with CIDR 158 based approach itself without reorganization of the existing provider 159 network. Section 4 provides a solution for identifying a host 160 uniquely with a number in a real IP environment. RFC 4213 [1] has 161 already described the transition mechanism from IPv4 to IPv6 for 162 individual nodes and routers. 164 Transitioning to real IP will eliminate the extra routing entries 165 associated with multihomed sites and thus will reduce the size of the 166 BGP table substantially. Assignment of addresses requires an 167 architectural framework. It may continue with the existing CIDR based 168 architecture (provided transitioning to real IP will be good enough 169 to handle all routing related issues for ever) or may come out with a 170 different approach. Mesh structured hierarchy will reduce the growth 171 of routing entries in a CIDR based environment as well as convenient 172 for distribution of network resources in a suitable manner in the 173 long run. 175 This document also tries to resolve and enhance several issues that 176 were carried on as part of deployment of IPv6. It shows that a 64 bit 177 address space is good enough for all practical purposes. With the 178 useful works done through IPv6, it provides all necessary inputs 179 based on which a specification of IP with 64 bit address space may be 180 emerged. 182 3. A Three-tier mesh structured hierarchical network 184 As Autonomous Systems of various sizes are supported, Autonomous 185 Systems and the nodes inside the Autonomous Systems can be viewed as 186 graphically lying on the same plane within the address apace. If 187 network can be viewed as lying on different planes, routing issues 188 can be made simpler. If network is designed with a fixed length of 189 prefix for the Autonomous System everywhere, routing information for 190 the rest will get confined with the other part of the network prefix. 191 Which means the maximum size of AS gets assigned to all irrespective 192 of their actual sizes. This can be made possible with the advantage 193 of using a large address space and dividing it into number of regions 194 of fixed sizes inside it. Thus entire network can be viewed as a 195 network of inter-AS layer nodes. Each node in the inter-AS layer can 196 act either only as a router in the inter-AS layer or as a router in 197 the inter-AS layer with an Autonomous System attached to it with a 198 single point of attachment or as an Autonomous System with multiple 199 Autonomous System border routers (ASBR) appearing like a mesh. Thus 200 two tier mesh structured hierarchy gets established between AS layer 201 and inter-AS layer with each AS having a fixed length of prefix. 203 Based on the definition of Autonomous System, it is a small area 204 within the entire network that maintains its own independent identity 205 that communicates with the rest of the world through some specific 206 border routers. In the similar manner, if a larger area (say region 207 or state) can be considered as network of Autonomous Systems, that 208 can maintain its own identity by communicating with the rest of the 209 world through some border routers (say, state border router), mesh 210 structured hierarchy can be established within the inter-AS layer. 211 The inter-AS layer will be split into inter-AS-top and inter-AS- 212 bottom. To maintain this hierarchy, each node of inter-AS-top needs 213 to have multiple regional or state border routers (say, SBR) through 214 which each one will communicate with the rest of the world in the 215 similar manner an Autonomous System maintains ASBR. Thus, entire 216 network will appear as a network of nodes of inter-AS-top layer. To 217 maintain hierarchy, each node of the inter-AS-top needs to have a 218 fixed length of prefix. i.e. each node of the inter-AS top will be 219 assigned a maximum (fixed) number of nodes of Autonomous Systems. 221 Thus, with three-tier mesh structured hierarchy in the network layer, 222 network ID can be viewed as A.B.C. If pA, pB and pC be the prefix 223 lengths of inter-AS-top, inter-AS-bottom and AS layers respectively, 224 there will be 2^pA nodes at the topmost layer, 2^pB at the inter-AS- 225 bottom layer and 2^pC nodes at the AS layer. Thus the entire space 226 gets divided into a fixed number of regions and each region gets 227 divided into fixed number of sub regions. This division is supposed 228 to be made based on geography, population density and their demands 229 and related factors. 231 Let nMaxInterASTopNodes be the possible maximum number of nodes 232 assigned at the top most layer and nMaxInterASBottomNodes be that at 233 the inter-AS-bottom layer and nMaxASNodes at the AS layer. Where 234 nMaxInterASTopNodes <= 2^pA and nMaxInterASBottomNodes <= 2^pB and 235 nMaxASNodes <= 2^pC. 237 3.1. Route propagation 239 With hierarchy established, routing information that gets established 240 inside a node of inter-AS-top, does not need to be propagated to 241 another node of inter-AS-top. Entire routing information of inter-AS- 242 top layer needs to be propagated to inter-AS-bottom layer. So, each 243 router of inter-AS layer will have two tables of information, one for 244 the inter-AS-top and another for the inter-AS-bottom of the inter-AS- 245 top node that it belongs to. BGP (with little modification) will work 246 very well with a trick applied at the SBRs. Each SBR will not 247 propagate the routing information of inter-AS-bottom layer of its 248 domain to another SBR of neighboring domain. i.e. SBR of one top 249 layer node will propagate routing information only of inter-AS-top 250 layer to SBR of another top layer node. Inside a node of inter-AS- 251 top, routing information of inter-AS-top and inter-AS-bottom need to 252 be propagated from one ASBR to another neighboring ASBR. Inside a top 253 layer node A, routing information of another top layer node B will 254 have two parts; one for the list of SBRs through which a packet will 255 traverse from top layer node A to B and another for the list of ASBRs 256 through which the packet will traverse from one AS to another inside 257 A. In terms of BGP, AS_PATH attribute will be split into two parts; 258 one for the information of the top layer and another for the bottom 259 layer. Within the same node A routing information of one AS to 260 another AS will not have any top layer information. i.e. the top 261 layer information will be set to as NULL. 263 Similarly, each node of the AS layer will have three tables of 264 routing entries. One for the inter-AS-top, one for the inter-AS- 265 bottom and another for the routing information inside the Autonomous 266 System itself. 268 Introduction of hierarchy at the inter-AS layer reduces the size of 269 the routing table substantially. With the availability of hardware 270 resources if flat address space is maintained at each layer, problems 271 related to CIDR can be avoided. With flat address space, no 272 hierarchical relationship needs to be established between any two 273 nodes in the same layer. So, all the nodes inside each layer can be 274 used till they get exhausted. With flat address space (i.e. without 275 prefix reduction), BGP tables will have maximum nMaxInterASTopNodes + 276 nMaxInterASBottomNodes entries. 278 IGP like OSPF has got provision to divide AS into smaller areas. OSPF 279 hides the topology of an area from the rest of the Autonomous System. 280 This information hiding enables a significant reduction in routing 281 traffic. With the support of subnetting, OSPF attaches an IP address 282 mask to indicate a range of IP addresses being described by that 283 particular route. With this approach it reduces the size of the 284 routing traffic instead of describing all the nodes inside it, but 285 introduces another level of hierarchy. If subnetting concept can be 286 avoided from the AS layer(with the additional overhead of computation 287 inside the SPF tree), each area can be configured from a free pool of 288 addresses based on its requirement dynamically. So, an AS can be 289 divided into number of areas of heterogeneous sizes with the nodes 290 from a free pool of address space. 292 Similarly, the concept of area can be introduced in the inter-AS- 293 bottom layer the way it works in OSPF. The area border routers in the 294 inter-AS-bottom layer have to behave exactly in the similar manner 295 the way an ABR behaves in OSPF. i.e. an area border router will hide 296 the topology inside an area to the rest of the world and will 297 distribute the collected information inside the area to the rest. It 298 will distribute the collected routing information from outside to the 299 nodes inside as well. In order to implement this, protocol running in 300 the inter-AS layer (say BGP) will have to introduce a 'cost' factor. 301 This cost factor can be interpreted as the cost of propagation of a 302 packet from one AS to another. The protocols running inside AS layer 303 (RIP/OSPF, etc) will have to the supply the cost information for a 304 packet to travel from one ASBR to another. All the protocols must 305 behave in unison for supplying this information. The cost factor is 306 needed for a remote node while sending a packet to a node inside an 307 area while more than one area border routers are equidistant from 308 that remote node. Thus inter-AS-bottom layer (i.e. one inter-AS-top 309 level node) can be divided into number of areas of heterogeneous 310 sizes with nodes of AS from a free pool of address space. BGP adopts 311 a technique called route aggregation. Along with route aggregation it 312 reduces routing information within a message. In the similar manner, 313 introduction of area inside inter-AS-bottom layer will not only 314 reduce the complexity of the protocol, but will reduce the size of a 315 BGP packet substantially. 317 With this architecture, each node(router) inside an AS is represented 318 as A.B.C. Each node may or may not be attached with a network which 319 acts as a leaf node (i.e. a network will not act as a transit). In 320 order to make use of user-id space properly and to support customer 321 networks of heterogeneous sizes, the user-ID space needs to be 322 divided as subnet-ID and user-ID. Profoundly, a VLSM (variable length 323 subnet mask) type of approach (in the form of a tree) has to be 324 adopted at each node of an AS. So, each node of the AS layer will act 325 as the root of a tree whose leaves are independent small customer 326 networks which will act as stub. As the routing information of inter- 327 AS layer as well as AS layer need not be passed inside any node of 328 the VLSM tree, each router inside the tree should maintain default 329 route for any address outside of its network/domain. With this 330 approach, load on each router of the service providers will become 331 negligible. Protocols that supports VLSM with MPLS/VPN has to be 332 implemented inside the tree. Inside the VLSM tree, all the physical 333 ports of a switch have to be configured with the subnet mask. A light 334 weight routing protocol can be developed on top of static routing 335 table by setting default route inside VLSM tree. 337 The fundamental assumptions based on which this architecture lies can 338 be summarized as follows: 340 i) Entire network can be viewed as a network of regions or states 341 where each region or state can have its own identity by communicating 342 with the rest of the world through some state border routers. Each 343 region or state is a network of Autonomous Systems. Each region as 344 well as each Autonomous System inside them will have a fixed 345 (maximum) length of prefix. 347 ii) Availability of hardware resources is such that flat address 348 space can be maintained at the inter-AS layer. 350 Introduction of mesh-structured hierarchy will have several 351 advantages: 353 o Load at each router will get reduced substantially. 354 o Concept of CIDR style approach and complexity related to 355 prefix reduction can be easily avoided. 356 o Mesh structured hierarchy will make traffic evenly distributed. 357 o Physical cable connection can be optimized. 358 o Administrative issues will become easier. 360 3.2. Determination of prefix lengths 362 With this architecture, IP address can be described as A.B.C.D where 363 the D part represents the user id. Each router in the inter-AS layer 364 will have two tables of information, one for the inter-AS-top and 365 another for the inter-AS-bottom of the inter-AS-top node that it 366 belongs to. Whereas, each node of the AS layer will have three tables 367 of routing entries; one for the inter-AS-top, one for the inter-AS- 368 bottom and another for the routing information inside the Autonomous 369 System itself. In the worst case. a node inside an AS needs to 370 maintain nMaxInterASTopNodes + nMaxInterASBottomNodes + nMaxASNodes 371 entries in its routing table. 373 The dynamic nature of allocating an area from a free pool of address 374 space is more frequent at the AS layer than at the inter-AS-bottom 375 layer. As OSPF supports all the features needed, it can be considered 376 as default choice in the AS layer. Existing implementation of OSPF 377 (Version 2) supports subnetting, by which an entire area can be 378 represented as a combination of network address and subnet mask. With 379 this approach, entire routing table gets reduced substantially. With 380 the removal of subnetting, all the nodes inside an area will have an 381 entry inside the routing table (OSPF Version 1). So the deterministic 382 factor is what is the maximum number of nodes inside an AS OSPF can 383 support once subnetting support gets removed. So the prefix length of 384 AS layer will be determined by this factor of OSPF. 386 With the introduction of hierarchy in the inter-AS layer, number of 387 entries in the BGP routing table will get reduced substantially. Even 388 if pA and pB both are selected as 16, number of routing entries come 389 within the admissible range of existing BGP protocol. But, it is the 390 responsibility of IANA to come out with a scheme how 391 nMaxInterASTopNodes and nMaxInterASBottomNodes are to be selected. 392 Each top level node will have nMaxInterASBottomNodes nodes. It will 393 be a waste of address space if each country gets assigned a top level 394 nodes (e.g. china has got a population of 1,306,313,800 people where 395 as Vatican City has got only 920 according to a census of 2006). So a 396 moderate value of nMaxInterASBottomNodes is desirable, with which 397 larger countries will have a number of top level nodes. e.g. each 398 state of USA can be assigned a top level node. With the introduction 399 of area in the inter-AS-bottom layer, each top level node can be 400 divided into number of areas of heterogeneous sizes. So, a group of 401 neighboring countries with less population can share the address 402 space of a top level node. Similarly, user-id space has to be decided 403 based on the largest area VLSM tree should be spanned through. All 404 these issues are completely geo political and have to be decided by 405 IANA. 407 3.2.1. A pseudo optimal distribution of prefixes in a 64 bit 408 architecture 410 In order to have optimal use of cable connections, length of the VLSM 411 tree is expected to be as short as possible. Also any single 412 organization may prefer to have its user id space to be under the 413 same network id. So, a 16 bit user-id may become insufficient for 414 places like large university campus, where as 32 bit will become too 415 large. Hence, 24 bit user-id will be a moderate one which is the 416 class A address space in IPv4 (also used as the space for private 417 IP). As published in 1998 [6], OSPF can support an area with 1600 418 routers and 30K external LSAs. So, 11 bits are needed to support this 419 space. With the assumption that OSPF can support much more address 420 space with the advancement of hardware technology as well as to keep 421 the space open for future expansions, 12 bits are assigned for the AS 422 layer. 16 bits are assigned for the inter-AS-bottom layer. So, if on 423 the average, 16 bit equivalent space gets used within the user-id 424 space (i.e. one out of 256) and 8 bit equivalent nodes gets used 425 inside an AS (16% of 1600), for a top level node (with 16 bit 426 equivalent AS nodes), it will generate 2^40 IP addresses, which will 427 give 8629 IP addresses per person in Japan (with a population of 428 127417200; Japan is at the 10th position from the top in the 429 population list of the world). So, even if all the countries with 430 population less than or equal to Japan are assigned a top level node 431 and all the provinces/states of countries with larger population are 432 assigned a top level node each, total number of nodes will come well 433 under 1024. If a number of neighboring countries with lesser 434 population shares a top level node, total number of top level nodes 435 will come down further. This suggests that 62 bit equivalent 436 (10(pA)+16(pB)+12(pC)+24(user-id)) space will be good enough for 437 unicast addresses. This distribution expects OSPF to support 65K 438 (64K+1K) external LSAs. 440 Distribution of address space will be finalized based on the 441 consultation with IANA. Primarily, they may appear to be as follows: 443 64 bit address space may be divided into two 63 bits blocks: 445 i. Global unicast addresses with the most significant bit set to 0. 446 This space is equally divided between provider assigned (PA) address 447 space and provider independent (PI) address space. 449 a) Provider assigned address space with prefix 00. 451 b) Provider independent (PI) address space with prefix 01. Provider 452 independent address space will be used for the customers who would 453 like to retain their number even after changing their providers. As 454 routing will be based on PA addresses, each PI address will be 455 associated to at least one PA address. Most significant part of PI 456 addressing is, it is independent of the architectural framework of 457 the provider network; even if the architectural framework changes, 458 same format of PI addressing can be maintained. Once implemented, PI 459 address of a node will be the number that will be generally used by 460 the common people. Section 4 describes issues related to PI 461 addressing in detail. 463 ii. Address space with the MSB set to 1 will be distributed within 464 the rest. Each of them will have a fixed prefix. This distribution 465 will be based on the requirements and the work that have already been 466 done in connection to IPv6: 468 a) Address space for multicasting with a prefix set to 1111. 470 b) Address space for link-local address: Link local addresses will 471 have a prefix 1110. 473 c) Router address space: This space will be used by the routers and 474 will have a prefix 1101. 476 d) Address space for private IP: Each customer network can maintain 477 private address space to communicate within its users. This space 478 will be distributed within all the customer sites of a corporate that 479 can maintain VPN services. A 32 bit address space should be good 480 enough for private IP. Private address space will have a 32 bit 481 prefix with leading 4 bits are set to 1100 and the rest are set to 1. 483 Rest of the address space has been kept for future use. 485 3.2.2. Whether to go for a two-tier or three-tier hierarchy 487 Establishment of hierarchy in the inter-AS layer reduces the size of 488 BGP entries to a great extent, but leads to an improper use of 489 address space due to geo-political reason. If hierarchy in the inter- 490 AS space gets removed, entire 26 bit (10+16) space will be available 491 for a single layer and use of inter-AS space will be true to its 492 sense, but will increase external LSA (and/or number of entries in 493 the BGP table) dramatically. So, it depends on to what extent OSPF 494 can support external LSAs. BGP expects the packet length to be 495 limited to 4096 bytes. BGP manages to make it work with this 496 limitation with the concept of prefix reduction in the CIDR based 497 environment. As the number of inter-AS nodes increases, BGP has to 498 change this limit in order to make it work in flat address space. The 499 alternate will be to divide the inter-AS space into number of areas 500 as defined in section 2.1. The area border routers will advertise the 501 aggregated information to the rest of the world. BGP may have to 502 incorporate both the options at the same time. As the number of nodes 503 in the inter-AS layer increases, in order to reduce the number of 504 entries in the routing table, inter-AS space has to be split into two 505 separate planes. So, two-tier hierarchy can be considered as an 506 interim state to go for three-tier hierarchy. If it so happen that 507 current available data is good enough to support the present need, it 508 will be worth to look for to what extent it can support in the 509 future. Assignment of inter-AS nodes in two-tier hierarchy should be 510 based on the geographical distribution as if it is part of three-tier 511 hierarchy. Otherwise, introduction of three-tier hierarchy in the 512 future will become another difficult task to go through. Based on the 513 report of year 2011, BGP supports ~400,000 entries in the routing 514 table. With this growing trend, BGP may have to change the limit of 515 packet length even in a CIDR based environment. With the introduction 516 of two-tier hierarchy, number of entries in the routing table will 517 come down drastically and with the three-tier approach, it will come 518 down further. 520 3.3. Issues related to Satellite communications 522 Establishment of hierarchy in the inter-AS layer expects the only way 523 any two autonomous systems in two different top level nodes 524 communicate is through their SBRs. If two autonomous systems inside 525 the same top level node communicate through satellite, it will be 526 considered as a direct link between them. Whenever autonomous system 527 'ASa' of top level node 'A' communicates with autonomous system 'ASb' 528 of top level node 'B' through satellite, they have to go through 529 their state border routers. i.e. satellite port inside 'A' that 530 communicates with a satellite port inside 'B' will be considered as 531 state border router. If multiple such ports exists inside node 'A', 532 all of them will be equidistant from any port inside 'B'. Which 533 expects any satellite port inside 'B' to have prior knowledge of list 534 of autonomous systems that will be under the purview of any port 535 inside 'A'. So, all the satellite ports of 'A' have to exchange such 536 group of information with all the satellite ports of 'B' and vice 537 versa. These group of autonomous systems can be considered as a 538 cluster of autonomous systems inside an area of a top level node. If 539 number of such ports is small, some heuristics can be applied while 540 assigning AS numbers in order to reduce the processing time during 541 the circuit establishment phase. It will become difficult to 542 maintain such heuristics once the number of such ports becomes large. 543 So, in case of satellite communication, the advantage of establishing 544 hierarchy inside inter-AS layer diminishes as the number of satellite 545 ports increases. If any private corporate maintains its own satellite 546 channel to communicate between its offices at distant locations, all 547 of these offices are going to be considered as under the user-id 548 space of its network. Service providers that provide satellite 549 services to the end-site customers, can operate in the usual manner 550 as they will provide connection to customer networks which will act 551 as stub. 553 3.4. Setting default route inside VLSM tree 555 Section 3.1 describes that there is no need to pass down the routing 556 information of the external world inside VLSM tree that acts as a 557 stub. Inside a VLSM tree, a node of higher prefix can be divided into 558 number of nodes with lower prefixes. Each divided node can further be 559 subdivided with nodes of further lower prefixes. This process can be 560 continued as long as it is desired or no more division is further 561 possible. 563 Following figure shows a typical arrangement of VLSM tree of a 564 service provider's network with IPv4 address space. Switch SW-A is 565 connected to the outside world and maintains global routing table. It 566 acts as the root of a VLSM tree that acts as a stub. It has been 567 assigned an address block 11.1.16.0/20 which is distributed among its 568 four children SW-B, SW-C, SW-D and SW-E with the approach of VLSM. 569 Switch SW-B further divides its address space between switches SW-F 570 and SW-G. Switch SW-F assigns an address block 11.1.16.0/24 to 571 customer network CN-A. Switch SW-G assigns address block 11.1.20.0/24 572 and 11.1.21.0/24 to two customers CN-B and CN-C; where as switch SW-E 573 assigns address block 11.1.30.0/24 to customer network CN-D. 575 Routing inside the tree takes place with the following principle. 577 Inside the tree, if a node (switch/router) that is assigned a domain 578 (NetAddr/NetMask) receives a packet which is destined to somewhere 579 outside of its domain, needs to forward the packet to its parent in 580 the hierarchy. 582 +--------------+ 583 | SW-A | 584 | 11.1.16.0/20 | 585 +-+-+------+-+-+ 586 | | | | 587 +---------------+ | | +----------------+ 588 | | | | 589 +------+-----+ +---------+--+ +-+----------+ +-----+------+ 590 | SW-B | | SW-C | | SW-D | | SW-E | 591 |11.1.16.0/21| |11.1.24.0/22| |11.1.28.0/23| |11.1.30.0/23| 592 +---+----+---+ +------------+ +------------+ +--+---------+ 593 | | | 594 | +-------+ | 595 | | +--+--+ 596 +-------+----+ +----+-------+ |CN-D | 597 | SW-F | | SW-G | +-----+ 598 |11.1.16.0/22| |11.1.20.0/22| 11.1.30.0/24 599 +--+---------+ +--+------+--+ 600 | | | 601 | | | 602 +--+--+ +--+--+ +-+---+ 603 |CN-A | |CN-B | |CN-C | 604 +-----+ +-----+ +-----+ 605 11.1.16.0/24 11.1.20.0/24 11.1.21.0/24 607 If a host in CN-A wants to send a packet to an address 11.1.21.116, 608 CE router of CN-A forwards it to SW-F. SW-F finds the destination 609 address of the packet to be outside of its domain and forwards the 610 packet to its parent SW-B. SW-B finds that a port that has been 611 configured with the matching destination address and forwards it to 612 its child SW-G. Switch SW-G sends the packet to customer network CN- 613 B. 615 If a host in CN-B wants to send a packet to 11.1.17.120, CE router of 616 CN-B forwards the packet to SW-G. SW-G finds the destination address 617 of the packet to be outside of its domain and forwards the packet to 618 its parent SW-B. SW-B finds that a port that has been configured with 619 the matching destination address and forwards the packet to its child 620 SW-F. SW-F finds the destination address to be within its domain, but 621 no port has been configured with the matching destination address and 622 generates ICMP UNREACHABLE. 624 If a host in CN-C wants to send a packet to 16.2.22.116, CE router of 625 CN-C forwards the packet to SW-G. SW-G finds the destination address 626 of the packet to be outside its domain and forwards the packet to SW- 627 B. SW-B forwards the packet to its parent SW-A. SW-A find the 628 destination address of the packet to be outside its domain and 629 consults with the global forwarding table and forwards the packet 630 through the right port. 632 3.4.1. IP VPN with MPLS inside VLSM tree 634 Section 3.1 describes that there is no need to pass down the routing 635 information of the external world inside VLSM tree. This section 636 describes how to make IP VPN work inside VLSM tree without using BGP. 638 RFC4364 [7] describes "IP VPN" with BGP/MPLS. To support VPN, PE 639 routers maintain per-site forwarding table. When a packet arrives 640 from an associated CE router, PE router consults with this forwarding 641 table to forward the packet. If the packet is supposed to be 642 forwarded to another site of VPN through the backbone, it uses two- 643 level label stack. The upper label is used to forward the packet from 644 ingress PE router to the egress PE router; where as, the inner label 645 is used for the egress PE router to identify the associated CE router 646 where the packet is supposed to be forwarded. BGP is used by the 647 Service Provider to exchange the routes of a particular VPN among the 648 PE routers that are attached to that VPN. Configuration takes place 649 on PE routers of both the sides of LSP. The simplest way to achieve 650 this is to configure these attributes manually on PE routers. In 651 order to have dynamic allocation of inner label, MPLS signaling 652 protocols (in place of BGP) need to be extended. Allocation of inner 653 label has to be done by the egress PE router. Same message that is 654 used for the assignment of upper label may be used for the assignment 655 of inner label. Inside the forwarding table, each entry contains the 656 forwarding destination address based on a set of destination 657 addresses (NetAddress/NetMask) of the IP packets received from 658 ingress CE router. While establishing inner label, ingress PE router 659 needs to send these attributes with the signaling message and the 660 egress PE router needs to validate those before assigning label. 662 3.4.1.1. Extension to RSVP-TE to support IP VPN inside VLSM tree 664 This section describes extension to RSVP-TE[17] to support dynamic 665 allocation of inner label of two-level label stack used to support 666 VPN services. 668 In order to establish LSP using RSVP-TE, ingress PE router sends Path 669 message to the egress PE router. Path message is augmented with a 670 LABEL_REQUEST object. Labels are allocated downstream and 671 distributed (propagated upstream) by means of RSVP Resv message. For 672 this purpose, the RSVP Resv message is extended with a special LABEL 673 object. In order to support VPN to establish the inner label, Path 674 message is augmented with a VPN_ATTRIBUTE label. Similarly, RSVP Resv 675 message is extended with a VPN_LABEL object. When an egress PE router 676 receives a Path message, it checks the presence of VPN_ATTRIBUTE 677 object. On finding this object, egress PE router checks the viability 678 of assignment of VPN label with the parameters from the VPN_ATTRIBUTE 679 object and the attributes that are already configured with the egress 680 PE router. If the test is positive, it assigns a VPN label and does 681 the rest of the processing of LSP label assignment and sends the RSVP 682 Resv message with the extension of VPN_LABEL object towards the 683 ingress PE router. On receiving Resv message with VPN_LABEL object, 684 ingress PE router assigns VPN label along with the rest of the 685 processing of Resv message and completes the operation. VPN_ATTRIBUTE 686 and VPN_LABEL objects are described below. 688 VPN_LABEL class=, C-Type=1 689 0 1 2 3 690 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 691 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 692 | (inner label) | 693 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 695 VPN_ATTRIBUTE class=, C-Type=1 696 0 1 2 3 697 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 698 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 699 | Global Unicast Address of Ingress CE Router | 700 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 701 | Global Unicast Address of Egress CE Router | 702 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 703 | Net Address of Destination IP Packet | 704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 705 | Net Mask of Destination IP Packet | 706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 708 The format of the Path message is as follows: 710 ::= [ ] 711 712 713 [ ] 714 715 [ ] 716 [ ] 717 [ ... ] 718 720 ::= 721 [ ] 722 [ ] 724 The format of the Resv message is as follows: 726 ::= [ ] 727 728 729 [ ] [ ] 730 [ ... ] 731 [ ] 732