idnits 2.17.1 draft-irtf-routing-history-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([I-D.irtf-routing-reqs]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 16, 2009) is 5540 days in the past. Is this intentional? Checking references for intended status: Historic ---------------------------------------------------------------------------- == Outdated reference: A later version (-11) exists of draft-ietf-bfd-base-09 == Outdated reference: A later version (-11) exists of draft-irtf-routing-reqs-10 -- Obsolete informational reference (is this intentional?): RFC 1105 (Obsoleted by RFC 1163) -- Obsolete informational reference (is this intentional?): RFC 1163 (Obsoleted by RFC 1267) -- Obsolete informational reference (is this intentional?): RFC 1771 (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 2362 (Obsoleted by RFC 4601, RFC 5059) -- Obsolete informational reference (is this intentional?): RFC 4601 (Obsoleted by RFC 7761) -- Obsolete informational reference (is this intentional?): RFC 4893 (Obsoleted by RFC 6793) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Davies 3 Internet-Draft Folly Consulting 4 Intended status: Historic A. Doria 5 Expires: August 20, 2009 LTU 6 February 16, 2009 8 Analysis of Inter-Domain Routing Requirements and History 9 draft-irtf-routing-history-10.txt 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with the 14 provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on August 20, 2009. 34 Copyright Notice 36 Copyright (c) 2009 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. 46 Abstract 48 This document analyses the state of the Internet domain-based routing 49 system, concentrating on Inter-Domain Routing (IDR) and also 50 considering the relationship between inter-domain and intra-domain 51 routing. The analysis is carried out with respect to RFC 1126 and 52 other IDR requirements and design efforts looking at the routing 53 system as it appeared to be in 2001 with editorial additions 54 reflecting developments up to 2006. It is the companion document to 55 "A Set of Possible Requirements for a Future Routing Architecture" 56 [I-D.irtf-routing-reqs], which is a discussion of requirements for 57 the future routing architecture, addressing systems developments and 58 future routing protocols. This document summarizes discussions held 59 several years ago by members of the IRTF Routing Research Group (IRTF 60 RRG) and other interested parties. The document is published with 61 the support of the IRTF RRG as a record of the work completed at that 62 time, but with the understanding that it does not necessarily 63 represent either the latest technical understanding or the technical 64 concensus of the research group at the date of publication. 65 [Note to RFC Editor: Please replace the reference in the abstract 66 with a non-reference quoting the RFC number of the companion 67 document when it is allocated, i.e., '(RFC xxxx)' and remove this 68 note.] 70 Table of Contents 72 1. Provenance of this Document . . . . . . . . . . . . . . . . . 4 73 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 2.1. Background . . . . . . . . . . . . . . . . . . . . . . . . 7 75 3. Historical Perspective . . . . . . . . . . . . . . . . . . . . 7 76 3.1. The Legacy of RFC1126 . . . . . . . . . . . . . . . . . . 7 77 3.1.1. "General Requirements" . . . . . . . . . . . . . . . . 8 78 3.1.2. "Functional Requirements" . . . . . . . . . . . . . . 13 79 3.1.3. "Non-Goals" . . . . . . . . . . . . . . . . . . . . . 21 80 3.2. ISO OSI IDRP, BGP and the Development of Policy Routing . 24 81 3.3. Nimrod Requirements . . . . . . . . . . . . . . . . . . . 30 82 3.4. PNNI . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 83 4. Recent Research Work . . . . . . . . . . . . . . . . . . . . . 32 84 4.1. Developments in Internet Connectivity . . . . . . . . . . 32 85 4.2. DARPA NewArch Project . . . . . . . . . . . . . . . . . . 33 86 4.2.1. Defending the End-to-End Principle . . . . . . . . . . 34 87 5. Existing problems of BGP and the current 88 Inter-/Intra-Domain Architecture . . . . . . . . . . . . . . . 34 89 5.1. BGP and Auto-aggregation . . . . . . . . . . . . . . . . . 34 90 5.2. Convergence and Recovery Issues . . . . . . . . . . . . . 35 91 5.3. Non-locality of Effects of Instability and 92 Misconfiguration . . . . . . . . . . . . . . . . . . . . . 36 93 5.4. Multihoming Issues . . . . . . . . . . . . . . . . . . . . 36 94 5.5. AS-number exhaustion . . . . . . . . . . . . . . . . . . . 37 95 5.6. Partitioned ASs . . . . . . . . . . . . . . . . . . . . . 38 96 5.7. Load Sharing . . . . . . . . . . . . . . . . . . . . . . . 38 97 5.8. Hold down issues . . . . . . . . . . . . . . . . . . . . . 38 98 5.9. Interaction between Inter-Domain Routing and 99 Intra-Domain Routing . . . . . . . . . . . . . . . . . . . 39 100 5.10. Policy Issues . . . . . . . . . . . . . . . . . . . . . . 40 101 5.11. Security Issues . . . . . . . . . . . . . . . . . . . . . 41 102 5.12. Support of MPLS and VPNS . . . . . . . . . . . . . . . . . 41 103 5.13. IPv4 / IPv6 Ships in the Night . . . . . . . . . . . . . . 42 104 5.14. Existing Tools to Support Effective Deployment of 105 Inter-Domain Routing . . . . . . . . . . . . . . . . . . . 42 106 5.14.1. Routing Policy Specification Language RPSL (RFC 107 2622, 2650) and RIPE NCC Database (RIPE 157) . . . . . 43 108 6. Security Considerations . . . . . . . . . . . . . . . . . . . 44 109 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 110 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 44 111 9. Informative References . . . . . . . . . . . . . . . . . . . . 45 112 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 49 114 1. Provenance of this Document 116 In 2001, the IRTF Routing Research Group (IRTF RRG) chairs, Abha 117 Ahuja and Sean Doran, decided to establish a sub-group to look at 118 requirements for inter-domain routing (IDR). A group of well known 119 routing experts was assembled to develop requirements for a new 120 routing architecture. Their mandate was to approach the problem 121 starting from a blank sheet. This group was free to take any 122 approach, including a revolutionary approach, in developing 123 requirements for solving the problems they saw in inter-domain 124 routing. Their eventual approach documented requirements for a 125 complete future routing and addressing architecture rather than just 126 the requirements for IDR. 128 Simultaneously, an independent effort was started in Sweden with a 129 similar goal. A team, calling itself Babylon, with participation 130 from vendors, service providers, and academia, assembled to 131 understand the history of inter-domain routing, to research the 132 problems seen by the service providers, and to develop a proposal of 133 requirements for a follow-on to the current routing architecture. 134 This group's approach required an evolutionary approach starting from 135 current routing architecture and practice. In other words the group 136 limited itself to developing an evolutionary strategy and 137 consequently assumed that the architecture would probably remain 138 domain-based. The Babylon group was later folded into the IRTF RRG 139 as Sub-group B to distinguish it from the original RRG Sub-group A. 141 This document, which was a part of Sub-group B's output, provides a 142 snapshot of the current state of Inter-Domain Routing (IDR) at the 143 time of original writing (2001) with some minor updates to take into 144 account developments since that date, bringing it up to date in 2006. 145 The development of the new requirements set is then motivated by an 146 analysis of the problems that IDR has been encountering in the recent 147 past. This document is intended as a counterpart to the Routing 148 Requirements document ("A Set of Possible Requirements for a Future 149 Routing Architecture") which documents the requirements for future 150 routing systems as captured separately by the IRTF RRG Sub-groups A 151 and B [I-D.irtf-routing-reqs]. 153 The IRTF RRG supported publication of this document as a historical 154 record of the work completed on the understanding that it does not 155 necessarily represent either the latest technical understanding or 156 the technical consensus of the research group at the time of 157 publication. The document has had substantial review by members of 158 the Babylon team, members of the IRTF RRG and others over the years. 160 2. Introduction 162 For the greater part of its existence the Internet has used a domain- 163 oriented routing system whereby the routers and other nodes making up 164 the infrastructure are partitioned into a set of administrative 165 domains, primarily along ownership lines. Individual routing domains 166 (also known as Autonomous Systems (ASs)), which maybe a subset of an 167 adminitrative domain, are made up of a finite, connected set of nodes 168 (at least in normal operation). Each routing domain is subject to a 169 coherent set of routing and other policies managed by a single 170 administrative authority. The domains are interlinked to form the 171 greater Internet producing a very large network: in practice, we have 172 to treat this network as if it were infinite in extent as there is no 173 central knowledge about the whole network of domains. An early 174 presentation of the concept of routing domains can be found Paul 175 Francis' OSI routing architecture paper from 1987 [Tsuchiya87] (Paul 176 Francis was formerly known as Paul Tsuchiya). 178 The domain concept and domain-oriented routing has become so 179 fundamental to Internet routing thinking that it is generally taken 180 as an axiom these days and not even defined again (c.f., 181 [NewArch03]). The issues discussed in the present document 182 notwithstanding, it has proved to be a robust and successful 183 architectural concept that brings with it the possibility of using 184 different routing mechanisms and protocols within the domains (intra- 185 domain) and between the domains (inter-domain). This is an 186 attractive division, because intra-domain protocols can exploit the 187 well-known finite scope of the domain and the mutual trust engendered 188 by shared ownership to give a high degree of control to the domain 189 administrators, whereas inter-domain routing lives in an essentially 190 infinite region featuring a climate of distrust built on a multtude 191 of competitive commercial agreements and driven by less-than-fully 192 public policies from each component domain. Of course, like any 193 other assumption that has been around for a very long time, the 194 domain concept should be reevaluated to make sure that it is still 195 helping! 197 It is generally accepted that there are major shortcomings in the 198 inter-domain routing of the Internet today and that these may result 199 in severe routing problems within an unspecified period of time. 200 Remedying these shortcomings will require extensive research to tie 201 down the exact failure modes that lead to these shortcomings and 202 identify the best techniques to remedy the situation. Comparatively, 203 intra-domain routing works satisfactorily, and issues with intra- 204 domain routing are mainly associated with the interface between 205 intra- and inter-domain routing. 207 Reviewer's Note: Even in 2001, there was a wide difference of 208 opinion across the community regarding the shortcomings of 209 interdomain routing. In the years between writing and 210 publication, further analysis, changes in operational practice, 211 alterations to the demands made on inter-domain routing, 212 modifications made to BGP and a recognition of the difficulty of 213 finding a replacement may have altered the views of some members 214 of the community. 216 Changes in the nature and quality of the services that users want 217 from the Internet are difficult to provide within the current 218 framework, as they impose requirements never foreseen by the original 219 architects of the Internet routing system. 221 The kind of radical changes that have to be accommodated are 222 epitomized by the advent of IPv6 and the application of IP mechanisms 223 to private commercial networks that offer specific service guarantees 224 beyond the best-effort services of the public Internet. Major 225 changes to the inter-domain routing system are inevitable to provide 226 an efficient underpinning for the radically changed and increasingly 227 commercially-based networks that rely on the IP protocol suite. 229 Current practice stresses the need to separate the concerns of the 230 control plane and the forwarding plane in a router: This document 231 will follow this practice, but we still use the term 'routing' as a 232 global portmanteau to cover all aspects of the system. 234 This document provides a historical perspective on the current state 235 of inter-domain routing and its relationship to intra-domain routing 236 in Section 3 by revisiting the previous IETF requirements document 237 intended to steer the development of a future routing system. These 238 requirements, which informed the design of the Border Gateway 239 Protocol (BGP) in 1989, are contained in RFC1126 - "Goals and 240 Functional Requirements for Inter-Autonomous System Routing" 241 [RFC1126]. 243 Section 3 also looks at some other work on requirements for domain- 244 based routing that was carried out before and after RFC1126 was 245 published. This work fleshes out the historical perspective and 246 provides some additional insights into alternative approaches which 247 may be instructive when building a new set of requirements. 249 The motivation for change and the inspiration for some of the 250 requirements for new routing architectures derive from the problems 251 attributable to the current domain-based routing system that are 252 being experienced in the Internet today. These will be discussed in 253 Section 5. 255 2.1. Background 257 Today's Internet uses an addressing and routing structure that has 258 developed in an ad hoc, more or less upwards-compatible fashion. The 259 structure has progressed from supporting a non-commercial Internet 260 with a single administrative domain to a solution that is able to 261 control today's multi-domain, federated Internet, carrying traffic 262 between the networks of commercial, governmental and not-for-profit 263 participants. This is not achieved without a great deal of 24/7 264 vigilance and operational activity by network operators: Internet 265 routing often appears to be running close to the limits of stability. 266 As well as directing traffic to its intended end-point, inter-domain 267 routing mechanisms are expected to implement a host of domain 268 specific routing policies for competing, communicating domains. The 269 result is not ideal, particularly as regards inter-domain routing 270 mechanisms, but it does a pretty fair job at its primary goal of 271 providing any-to-any connectivity to many millions of computers. 273 Based on a large body of anecdotal evidence, but also on a growing 274 body of experimental evidence [Labovitz02] and analytic work on the 275 stability of BGP under certain policy specifications [Griffin99], the 276 main Internet inter-domain routing protocol, BGP version 4 (BGP-4), 277 appears to have a number of problems. These problems are discussed 278 in more detail in Section 5. Additionally, the hierarchical nature 279 of the inter-domain routing problem appears to be changing as the 280 connectivity between domains becomes increasingly meshed [RFC3221] 281 which alters some of the scaling and structuring assumptions on which 282 BGP-4 is built. Patches and fix-ups may relieve some of these 283 problems but others may require a new architecture and new protocols. 285 3. Historical Perspective 287 3.1. The Legacy of RFC1126 289 RFC 1126 [RFC1126] outlined a set of requirements that were intended 290 to guide the development of BGP. 292 Editors' Note: When this document was reviewed by Yakov Rekhter, 293 one of the designers of BGP, his view was that "While some people 294 expected a set of requirements outlined in RFC1126 to guide the 295 development of BGP, in reality the development of BGP happened 296 completely independently of RFC1126. In other words, from the 297 point of view of the development of BGP, RFC1126 turned out to be 298 totally irrelevant." On the other hand, it appears that BGP as 299 currently implemented has met a large proportion of these 300 requirements, especially for unicast traffic. 302 While the network is demonstrably different from what it was in 1989, 303 having 304 o moved from single to multiple administrative control, 305 o increased in size by several orders of magnitude, and 306 o migrated from a fairly tree like connectivity graph to a meshier 307 style, 308 many of the same requirements remain. As a first step in setting 309 requirements for the future, we need to understand the requirements 310 that were originally set for the current protocols. And in charting 311 a future architecture we must first be sure to do no harm. This 312 means a future domain-based routing system has to support as its base 313 requirement, the level of function that is available today. 315 The following sections each relate to a requirement, or non- 316 requirement listed in RFC1126. In fact the section names are direct 317 quotes from the document. The discussion of these requirements 318 covers the following areas: 320 Explanation: Optional interpretation for today's audience of 321 the original intent of the requirement 323 Relevance: Is the requirement of RFC1126 still relevant, and 324 to what degree? Should it be understood 325 differently in today's environment? 327 Current practice: How well is the requirement met by current 328 protocols and practice? 330 3.1.1. "General Requirements" 332 3.1.1.1. "Route to Destination" 334 Timely routing to all reachable destinations, including multihoming 335 and multicast. 337 Relevance: Valid, but requirements for multihoming need 338 further discussion and elucidation. The 339 requirement should include multiple source 340 multicast routing. 342 Current practice: Multihoming is not efficient and the proposed 343 inter-domain multicast protocol BGMP [RFC3913] is 344 an add-on to BGP following many of the same 345 strategies but not integrated into the BGP 346 framework. 348 Editors' Note: Multicast routing has moved on 349 again since this was originally written. By 350 2006 BGMP had been effectively superseded. 351 Multicast routing now uses Multiprotocol BGP 352 [RFC4760], the Multicast Source Discovery 353 Protocol (MSDP) [RFC3618] and Protocol 354 Independent Multicast - Sparse Mode (PIM-SM) 355 [RFC2362], [RFC4601], especially the Source 356 Specific Multicast (SSM) subset. 358 3.1.1.2. "Routing is Assured" 360 This requires that a user be notified within a reasonable time period 361 after persistent attempts, about inability to provide a service. 363 Relevance: Valid 365 Current practice: There are ICMP messages for this, but in many 366 cases they are not used, either because of fears 367 about creating message storms or uncertainty about 368 whether the end system can do anything useful with 369 the resulting information. IPv6 implementations 370 may be able to make better use of the information 371 as they may have alternative addresses that could 372 be used to exploit an alternative routing. 374 3.1.1.3. "Large System" 376 The architecture was designed to accommodate the growth of the 377 Internet. 379 Relevance: Valid. Properties of Internet topology might be 380 an issue for future scalability (topology varies 381 from very sparse to quite dense at present). 382 Instead of setting out to accomodate growth in a 383 specific time period, indefinite growth should be 384 accommodated. On the other hand, such growth has 385 to be accommodated without making the protocols 386 too expensive - trade-offs may be necessary. 388 Current practice: Scalability of the current protocols will not be 389 sufficient under the current rate of growth. 390 There are problems with BGP convergence for large 391 dense topologies, problems with the slow speed of 392 routing information propagation between routers in 393 transit domains through the intra-domain protocol 394 for example when a failure requires traffic to be 395 redirected to an alternative exit point from the 396 domain (see Section 5.9, limited support for 397 hierarchy, etc. 399 3.1.1.4. "Autonomous Operation" 401 This requirement encapsulates the need for administrative domains 402 ("Autonomous Systems" - AS) to be able to operate autonomously as 403 regards setting routing policy: 405 Relevance: Valid. There may need to be additional 406 requirements for adjusting policy decisions to the 407 global functionality and for avoiding 408 contradictory policies. This would decrease the 409 possibility of unstable routing behavior. 411 There is a need for handling various degrees of 412 trust in autonomous operations, ranging from no 413 trust (e.g., between separate ISPs) to very high 414 trust where the domains have a common goal of 415 optimizing their mutual policies. 417 Policies for intra-domain operations should in 418 some cases be revealed, using suitable 419 abstractions. 421 Current practice: Policy management is in the control of network 422 managers, as required, but there is little support 423 for handling policies at an abstract level for a 424 domain. 426 Cooperating administrative entities decide about 427 the extent of cooperation independently. This can 428 lead to inconsistent, and potentially incompatible 429 routing policies being applied in notionally 430 cooperating domains. As discussed in Sections 431 (Section 5.2), (Section 5.3)and 432 (Section 5.10),lack of coordination combined with 433 global range of effects of BGP policies results in 434 occasional disruption of Internet routing over an 435 area far wider than the domains that are not 436 cooperating effectively. 438 3.1.1.5. "Distributed System" 440 The routing environment is a distributed system. The distributed 441 routing environment supports redundancy and diversity of nodes and 442 links. Both the controlling rule sets, which implement the routing 443 policies, and the places where operational control is applied, 444 through decisions on path selection, are distributed (primarily in 445 the routers). 447 Relevance: Valid. RFC1126 is very clear that we should not 448 be using centralized solutions, but maybe we need 449 a discussion on trade-offs between common 450 knowledge and distribution (i.e., to allow for 451 uniform policy routing, e.g., GSM systems are in a 452 sense centralized, but with hierarchies). 454 Current practice: Routing is very distributed, but lacking the 455 ability to consider optimization over several hops 456 or domains. 457 Editors' Note: Also coordinating the 458 implementation of a set of routing policies 459 across a large domain with many routers running 460 BGP is difficult. The policies have to be 461 turned into BGP rules and applied individually 462 to each router, giving opportunities for 463 mismatch and error. 465 3.1.1.6. "Provide A Credible Environment" 467 The routing environment and services should be based upon mechanisms 468 and information that exhibit both integrity and security. That is 469 the routers should always be working with credible data derived 470 through the reliable operation of protocols. Security from unwanted 471 modification and influence is required. 473 Relevance: Valid. 475 Current practice: BGP provides a limited mechanism for 476 authentication and security of peering sessions, 477 but this does not guarantee the authenticity or 478 validity of the routing information that is 479 exchanged. 481 There are certainly security problems with current 482 practice. The Routing Protocol Security 483 Requirements (rpsec) working group has been 484 struggling to agree on a set of requirements for 485 BGP security since early 2002. 487 Editors' note: Proposals for authenticating BGP 488 routing information using certificates were 489 under development by the Secure Inter-Domain 490 Routing (sidr) working group from 2006 through 491 2008. 493 3.1.1.7. "Be A Managed Entity" 495 Requires that the routing system provides adequate information on the 496 state of the network to allow resource, problem and fault management 497 to be carried out effectively and expeditiously. The system must 498 also provide controls that allow managers to use this information to 499 make informed decisions and use it to control the operation of the 500 routing system. 502 Relevance: The requirement is reasonable, but we might need 503 to be more specific on what information should be 504 available, e.g., to prevent routing oscillations. 506 Current practice: All policies are determined locally, where they 507 may appear reasonable but there is limited global 508 coordination through the routing policy databases 509 operated by the Internet registries (AfriNIC, 510 APNIC, ARIN, LACNIC, RIPE, etc.). 512 Operators are not required to register their 513 policies; even when policies are registered, it is 514 difficult to check that the actual policies in use 515 in other domains match the declared policies. 516 Therefore, a manager cannot guarantee to design 517 and implement policies that will interoperate with 518 those of other domains to provide stable routing. 519 Editors' note: Operators report that management 520 of BGP-based routing remains a function that 521 needs highly-skilled operators and continual 522 attention. 524 3.1.1.8. "Minimize Required Resources" 526 Relevance: Valid, however, the paragraph states that 527 assumptions on significant upgrades shouldn't be 528 made. Although this is reasonable, a new 529 architecture should perhaps be prepared to use 530 upgrades when they occur. 532 Current practice: Most bandwidth is consumed by the exchange of the 533 Network Layer Reachability Information (NLRI). 534 Usage of processing cycles ("Central Processor 535 Usage" - CPU) depends on the stability of the 536 Internet. Both phenomena have a local nature, so 537 there are not scaling problems with bandwidth and 538 CPU usage. Instability of routing increases the 539 consumption of resources in any case. The number 540 of networks in the Internet dominates memory 541 requirements - this is a scaling problem. 543 3.1.2. "Functional Requirements" 545 3.1.2.1. "Route Synthesis Requirements" 547 3.1.2.1.1. "Route around failures dynamically" 549 Relevance: Valid. Should perhaps be stronger. Only 550 providing a best-effort attempt may not be enough 551 if real-time services are to be provided for. 552 Detection of failures may need to be faster than 553 100ms to avoid being noticed by end-users. 555 Current practice: Latency of fail-over is too high; sometimes 556 minutes or longer. 558 3.1.2.1.2. "Provide loop free paths" 560 Relevance: Valid. Loops should occur only with negligible 561 probability and duration. 563 Current practice: Both link-state intra-domain routing and BGP 564 inter-domain routing (if correctly configured) are 565 forwarding-loop free after having converged. 566 However, convergence time for BGP can be very long 567 and poorly designed routing policies may result in 568 a number of BGP speakers engaging in a cyclic 569 pattern of advertisements and withdrawals which 570 never converges to a stable result [RFC3345]. 571 Part of the reason for long convergence times is 572 the non-locality of the effects of changes in BGP 573 advertisements (see Section 5.3). Modifying the 574 inter-domain routing protocol to make the effects 575 of changes less global, and convergence a more 576 local condition might improve performance, 577 assuming a suitable modification could be 578 developed. 580 3.1.2.1.3. "Know when a path or destination is unavailable" 582 Relevance: Valid to some extent, but there is a trade-off 583 between aggregation and immediate knowledge of 584 reachability. It requires that routing tables 585 contain enough information to determine that the 586 destination is unknown or a path cannot be 587 constructed to reach it. 589 Current practice: Knowledge about lost reachability propagates 590 slowly through the networks due to slow 591 convergence for route withdrawals. 593 3.1.2.1.4. "Provide paths sensitive to administrative policies" 595 Relevance: Valid. Policy control of routing has become 596 increasingly important as the Internet has turned 597 into a business. 599 Current practice: Supported to some extent. Policies can only be 600 applied locally in an AS and not globally. Policy 601 information supplied has a very small probability 602 of affecting policies in other ASs. Furthermore, 603 only static policies are supported; between static 604 policies and policies dependent upon volatile 605 events of great celerity there should exist events 606 that routing should be aware of. Lastly, there is 607 no support for policies other than route- 608 properties (such as AS-origin, AS-path, 609 destination prefix, MED-values etc). 611 Editors' note: Subsequent to the original issue 612 of this document mechanisms which acknowledge 613 the business relationships of operators have 614 been developed such as the NOPEER community 615 attribute [RFC3765]. However the level of 616 usage of this attribute is apparently not very 617 great. 619 3.1.2.1.5. "Provide paths sensitive to user policies" 621 Relevance: Valid to some extent, as they may conflict with 622 the policies of the network administrator. It is 623 likely that this requirement will be met by means 624 of different bit transport services offered by an 625 operator, but at the cost of adequate 626 provisioning, authentication and policing when 627 utilizing the service. 629 Current practice: Not supported in normal routing. Can be 630 accomplished to some extent with loose source 631 routing, resulting in inefficient forwarding in 632 the routers. The various attempts to introduce 633 Quality of Service (QoS - e.g., Integrated 634 Services and Differentiated Services (DiffServ)) 635 can also be seen as means to support this 636 requirement but they have met with limited success 637 in terms of providing alternate routes as opposed 638 to providing improved service on the standard 639 route. 640 Editor's Note: From the standpoint of a later 641 time, it would probably be more appropriate to 642 say "total failure" rather than "limited 643 success". 645 3.1.2.1.6. "Provide paths which characterize user quality-of-service 646 requirements" 648 Relevance: Valid to some extent, as they may conflict with 649 the policies of the operator. It is likely that 650 this requirement will be met by means of different 651 bit transport services offered by an operator, but 652 at the cost of adequate provisioning, 653 authentication and policing when utilizing the 654 service. It has become clear that offering to 655 provide a particular QoS to any arbitrary 656 destination from a particular source is generally 657 impossible: QoS except in very 'soft' forms such 658 as overall long term average packet delay, is 659 generally associated with connection oriented 660 routing. 662 Current practice: Creating routes with specified QoS is not 663 generally possible at present. 665 3.1.2.1.7. "Provide autonomy between inter- and intra-autonomous system 666 route synthesis" 668 Relevance: Inter- and intra-domain routing should stay 669 independent, but one should notice that this to 670 some extent contradicts the previous three 671 requirements. There is a trade-off between 672 abstraction and optimality. 674 Current practice: Inter-domain routing is performed independently of 675 intra-domain routing. Intra-domain routing is 676 however, especially in transit domains, very 677 interrelated with inter-domain routing. 679 3.1.2.2. "Forwarding Requirements" 681 3.1.2.2.1. "Decouple inter- and intra-autonomous system forwarding 682 decisions" 684 Relevance: Valid. 686 Current practice: As explained in Section 3.1.2.1.7, intra-domain 687 forwarding in transit domains is dependent on 688 inter-domain forwarding decisions. 690 3.1.2.2.2. "Do not forward datagrams deemed administratively 691 inappropriate" 693 Relevance: Valid, and increasingly important in the context 694 of enforcing policies correctly expressed through 695 routing advertisements but flouted by rogue peers 696 which send traffic for which a route has not been 697 advertised. On the other hand, packets that have 698 been misrouted due to transient routing problems 699 perhaps should be forwarded to reach the 700 destination, although along an unexpected path. 702 Current practice: At stub domains (i.e., domains that do not provide 703 any transit service for any other domains but that 704 connect directly to one or more transit domains) 705 there is packet filtering, e.g., to catch source 706 address spoofing on outgoing traffic or to filter 707 out unwanted incoming traffic. Filtering can in 708 particular reject traffic (such as unauthorized 709 transit traffic) that has been sent to a domain 710 even when it has not advertised a route for such 711 traffic on a given interface. The growing class 712 of 'middle boxes' (midboxes, e.g., Network Address 713 Translators - NATs) is quite likely to apply 714 administrative rules that will prevent forwarding 715 of packets. Note that security policies may 716 deliberately hide administrative denials. In the 717 backbone, intentional packet dropping based on 718 policies is not common. 720 3.1.2.2.3. "Do not forward datagrams to failed resources" 722 Relevance: Unclear, although it is clearly desirable to 723 minimise waste of forwarding resources by 724 discarding datagrams which cannot be delivered at 725 the earliest opportunity. There is a trade-off 726 between scalability and keeping track of 727 unreachable resources. The requirement 728 effectively imposes a requirement on adjacent 729 nodes to monitor for failures and take steps to 730 cause rerouting at the earliest opportinity if a 731 failure is detected. However, packets that are 732 already in flight or queued on a failed link 733 cannot generally be rescued. 735 Current practice: Routing protocols use both internal adjacency 736 management sub-protocols (e.g. Hello protocols) 737 and information from equipment and lower layer 738 link watchdogs to keep track of failures in 739 routers and connecting links. Failures will 740 eventually result in the routing protocol 741 reconfiguring the routing to avoid (if possible) a 742 failed resource, but this is generally very slow 743 (30s or more). In the meantime datagrams may well 744 be forwarded to failed resources. In general 745 terms, end hosts and some non-router middle boxes 746 do not participate in these notifications and 747 failures of such boxes will not affect the routing 748 system. 750 3.1.2.2.4. "Forward datagram according to its characteristics" 752 Relevance: Valid. This is necessary in enabling 753 differentiation in the network, based on QoS, 754 precedence, policy or security. 756 Current practice: Ingress and egress filtering can be done based on 757 policy. Some networks discriminate on the basis 758 of requested QoS. 760 3.1.2.3. "Information Requirements" 762 3.1.2.3.1. "Provide a distributed and descriptive information base" 764 Relevance: Valid, however an alternative arrangement of 765 information bases, possibly with an element of 766 centralization for the domain (as mentioned in 767 Section 3.1.1.5) might offer some advantages 768 through the ability to optimize across the domain 769 and respond more quickly to changes and failures. 771 Current practice: The information base is distributed, but it is 772 unclear whether it supports all necessary routing 773 functionality. 775 3.1.2.3.2. "Determine resource availability" 776 Relevance: Valid. It should be possible to determine the 777 availability and levels of availability of any 778 resource (such as bandwidth) needed to carry out 779 routing. This prevents needing to discover 780 unavailability through failure. Resource location 781 and discovery is arguably a separate concern that 782 could be addressed outside the core routing 783 requirements. 785 Current practice: Resource availability is predominantly handled 786 outside of the routing system. 788 3.1.2.3.3. "Restrain transmission utilization" 790 Relevance: Valid. However certain requirements in the 791 control plane, such as fast detection of faults 792 may be worth consumption of more resources. 793 Similarly, simplicity of implementation may make 794 it cheaper to 'back haul' traffic to central 795 locations to minimise the cost of routing if 796 bandwidth is cheaper than processing. 798 Current practice: BGP messages probably do not ordinarily consume 799 excessive resources, but might during erroneous 800 conditions. In the data plane, the near universal 801 adoption of shortest path protocols could be 802 considered to result in minimization of 803 transmission utilization. 805 3.1.2.3.4. "Allow limited information exchange" 807 Relevance: Valid. But perhaps routing could be improved if 808 certain information (especially policies) could be 809 available either globally or at least for a wider 810 defined locality. 811 Editors' note: Limited information exchange 812 would be potentially compatible with a more 813 local form of convergence than BGP tries to 814 achieve today. Limited information exchange is 815 potentially incompatible with global 816 convergence. 817 Current practice: Policies are used to determine which reachability 818 information is exported but neighbors receiving 819 the information are not generally aware of the 820 policies that resulted in this export. 822 3.1.2.4. "Environmental Requirements" 824 3.1.2.4.1. "Support a packet-switching environment" 826 Relevance: Valid but routing system should, perhaps, not be 827 limited to this exclusively. 829 Current practice: Supported. 831 3.1.2.4.2. "Accommodate a connection-less oriented user transport 832 service" 834 Relevance: Valid, but routing system should, perhaps, not be 835 limited to this exclusively. 837 Current practice: Accommodated. 839 3.1.2.4.3. "Accommodate 10K autonomous systems and 100K networks" 841 Relevance: No longer valid. Needs to be increased 842 potentially indefinitely. It is extremely 843 difficult to foresee the future size expansion of 844 the Internet so that the Utopian solution would be 845 to achieve an Internet whose architecture is scale 846 invariant. Regrettably, this may not be 847 achievable without introducing undesirable 848 complexity and a suitable trade off between 849 complexity and scalability is likely to be 850 necessary. 852 Current Practice: Supported but perhaps reaching its limit. Since 853 the original version of this document was written 854 in 2001, the number of ASs advertised has grown 855 from around 8000 to 20000, and almost 35000 AS 856 numbers have been allocated by the regional 857 registries [Huston05]. If this growth continues 858 the original 16 bit AS space in BGP-4 will be 859 exhausted in less than 5 years. Planning for an 860 extended AS space is now an urgent requirement. 862 3.1.2.4.4. "Allow for arbitrary interconnection of autonomous systems" 864 Relevance: Valid. However perhaps not all interconnections 865 should be accessible globally. 867 Current practice: BGP-4 allows for arbitrary interconnections. 869 3.1.2.5. "General Objectives" 871 3.1.2.5.1. "Provide routing services in a timely manner" 873 Relevance: Valid, as stated before. It might be acceptable 874 for a more complex service to take longer to 875 deliver, but it still has to meet the 876 application's requirements - routing has to be at 877 the service of the end-to-end principle. 878 Editors' note: Delays in setting up connections 879 due to network functions such as NAT boxes are 880 becoming increasingly problematic. The routing 881 system should try to keep any routing delay to 882 a minimum. 884 Current practice: More or less, with the exception of convergence 885 and fault robustness. 887 3.1.2.5.2. "Minimize constraints on systems with limited resources" 889 Relevance: Valid 891 Current practice: Systems with limited resources are typically stub 892 domains that advertise very little information. 894 3.1.2.5.3. "Minimize impact of dissimilarities between autonomous 895 systems" 897 Relevance: Important. This requirement is critical to a 898 future architecture. In a domain-based routing 899 environment where the internal properties of 900 domains may differ radically, it will be important 901 to be sure that these dissimilarities are 902 minimized at the borders. 903 Current: practice: For the most part this capability is not really 904 required in today's networks since the intra- 905 domain attributes are broadly similar across 906 domains. 908 3.1.2.5.4. "Accommodate the addressing schemes and protocol mechanisms 909 of the autonomous systems" 911 Relevance: Important, probably more so than when RFC1126 was 912 originally developed because of the potential 913 deployment of IPv6, wider usage of MPLS and the 914 increasing usage of VPNs. 916 Current practice: Only one global addressing scheme is supported in 917 most autonomous systems but the availability of 918 IPv6 services is steadily increasing. Some global 919 backbones support IPv6 routing and forwarding. 921 3.1.2.5.5. "Must be implementable by network vendors" 923 Relevance: Valid, but note that what can be implemented today 924 is different from what was possible when RFC1126 925 was written: a future domain-based routing 926 architecture should not be unreasonably 927 constrained by past limitations. 929 Current practice: BGP was implemented and meets a large proportion 930 of the original requirements. 932 3.1.3. "Non-Goals" 934 RFC1126 also included a section discussing non-goals. This section 935 discusses the extent to which these are still non-goals. It also 936 considers whether the fact that they were non-goals adversely affects 937 today's IDR system. 939 3.1.3.1. "Ubiquity" 941 The authors of RFC 1126 were explicitly saying that IP and its inter- 942 domain routing system need not be deployed in every AS, and a 943 participant should not necessarily expect to be able to reach a given 944 AS, possibly because of routing policies. In a sense this 'non-goal' 945 has effectively been achieved by the Internet and IP protocols. This 946 requirement reflects a different world view where there was serious 947 competition for network protocols, which is really no longer the 948 case. Ubiquitous deployment of inter-domain routing in particular 949 has been achieved and must not be undone by any proposed future 950 domain-based routing architecture. On the other hand: 951 o ubiquitous connectivity cannot be reached in a policy sensitive 952 environment and should not be an aim, 953 * Editor's Note: It has been pointed out that this statement 954 could be interpreted as being contrary to the Internet mission 955 of providing universal connectivity. The fact that limits to 956 connectivity will be added as operational requirements in a 957 policy sensitive environment should not imply that a future 958 domain-based routing architecture contains intrinsic limits on 959 connectivity. 960 o it must not be required that the same routing mechanisms are used 961 throughout provided that they can interoperate appropriately 962 o the information needed to control routing in a part of the network 963 should not necessarily be ubiquitously available and it must be 964 possible for an operator to hide commercially sensitive 965 information that is not needed outside a domain. 966 o the introduction of IPv6 reintroduces an element of diversity into 967 the world of network protocols but the similarities of IPv4 and 968 IPv6 as regards routing and forwarding make this event less likely 969 to drive an immediate diversification in routing systems. The 970 potential for further growth in the size of the network enabled by 971 IPv6 is very likely to require changes in the future: whether this 972 results in the replacement of one de facto ubiquitous system with 973 another remains to be seen but cannot be a requirement - it will 974 have to interoperate with BGP during the transition.. 976 Relevance: De facto essential for a future domain-based 977 routing architecture, but what is required is 978 ubiquity of the routing system rather than 979 ubiquity of connectivity and it must be capable of 980 a gradual takeover through interoperation with the 981 existing system. 983 Current practice: De facto ubiquity achieved. 985 3.1.3.2. "Congestion control" 987 Relevance: It is not clear if this non-goal was to be applied 988 to routing or forwarding. It is definitely a non- 989 goal to adapt the choice of route when there is 990 transient congestion. However, to add support for 991 congestion avoidance (e.g., Explicit Congestion 992 Notification (ECN) and ICMP messages) in the 993 forwarding process would be a useful addition. 994 There is also extensive work going on in traffic 995 engineering which should result in congestion 996 avoidance through routing as well as in 997 forwarding. 999 Current practice: Some ICMP messages (e.g., source quench) exist to 1000 deal with congestion control but these are not 1001 generally used as they either make the problem 1002 worse or there is no mechanism to reflect the 1003 message into the application which is providing 1004 the source. 1006 3.1.3.3. "Load splitting" 1008 Relevance: This should neither be a non-goal, nor an explicit 1009 goal. It might be desirable in some cases and 1010 should be considered as an optional architectural 1011 feature. 1013 Current practice: Can be implemented by exporting different prefixes 1014 on different links, but this requires manual 1015 configuration and does not consider actual load. 1017 Editors' Note: This configuration is carried 1018 out extensively as of 2006 and has been a 1019 significant factor in routing table bloat. If 1020 this need is a real operational requirement, as 1021 it seems to be for multihomed or otherwise 1022 richly connected sites, it will be necessary to 1023 reclassify this as a real and important goal. 1025 3.1.3.4. "Maximizing the utilization of resources" 1027 Relevance: Valid. Cost-efficiency should be striven for; we 1028 note that maximizing resource utilization does not 1029 always lead to greatest cost-efficiency. 1031 Current practice: Not currently part of the system, though often a 1032 'hacked in' feature done with manual 1033 configuration. 1035 3.1.3.5. "Schedule to deadline service" 1037 This non-goal was put in place to ensure that the IDR did not have to 1038 meet real time deadline goals such as might apply to Constant Bit 1039 Rate (CBR) real time services in ATM. 1041 Relevance: The hard form of deadline services is still a non- 1042 goal for the future domain-based routing 1043 architecture but overall delay bounds are much 1044 more of the essence than was the case when RFC1126 1045 was written. 1047 Current practice: Service providers are now offering overall 1048 probabilistic delay bounds on traffic contracts. 1049 To implement these contracts there is a 1050 requirement for a rather looser form of delay 1051 sensitive routing. 1053 3.1.3.6. "Non-interference policies of resource utilization" 1055 The requirement in RFC1126 is somewhat opaque, but appears to imply 1056 that what we would today call QoS routing is a non-goal and that 1057 routing would not seek to control the elastic characteristics of 1058 Internet traffic whereby a TCP connection can seek to utilize all the 1059 spare bandwidth on a route, possibly to the detriment of other 1060 connections sharing the route or crossing it. 1062 Relevance: Open Issue. It is not clear whether dynamic QoS 1063 routing can or should be implemented. Such a 1064 system would seek to control the admission and 1065 routing of traffic depending on current or recent 1066 resource utilization. This would be particularly 1067 problematic where traffic crosses an ownership 1068 boundary because of the need for potentially 1069 commercially sensitive information to be made 1070 available outside the ownership boundary. 1072 Current practice: Routing does not consider dynamic resource 1073 availability. Forwarding can support service 1074 differentiation. 1076 3.2. ISO OSI IDRP, BGP and the Development of Policy Routing 1078 During the decade before the widespread success of the World Wide 1079 Web, ISO was developing the communications architecture and protocol 1080 suite Open Systems Interconnection (OSI). For a considerable part of 1081 this time OSI was seen as a possible competitor for and even a 1082 replacement for the IP suite as this basis for the Internet. The 1083 technical developments of the two protocols were quite heavily 1084 interrelated with each providing ideas and even components that were 1085 adapted into the other suite. 1087 During the early stages of the development of OSI, the IP suite was 1088 still mainly in use on the ARPANET and the relatively small scale 1089 first phase NSFnet. This was effectively a single administrative 1090 domain with a simple tree structured network in a three level 1091 hierarchy connected to a single logical exchange point (the NSFnet 1092 backbone). In the second half of the 1980s the NSFNET was starting 1093 on the growth and transformation that would lead to today's Internet. 1094 It was becoming clear that the backbone routing protocol, the 1095 Exterior Gateway Protocol (EGP) [RFC0904], was not going to cope even 1096 with the limited expansion being planned. EGP is an "all informed" 1097 protocol which needed to know the identities of all gateways and this 1098 was no longer reasonable. With the increasing complexity of the 1099 NSFnet and the linkage of the NSFnet network to other networks there 1100 was a desire for policy-based routing which would allow 1101 administrators to manage the flow of packets between networks. The 1102 first version of the Border Gateway Protocol (BGP-1) [RFC1105] was 1103 developed as a replacement for EGP with policy capabilities - a 1104 stopgap EGP version 3 had been created as an interim measure while 1105 BGP was developed. BGP was designed to work on a hierarchically 1106 structured network, such as the original NSFNET, but could also work 1107 on networks that were at least partially non-hierarchical where there 1108 were links between ASs at the same level in the hierarchy (we would 1109 now call these 'peering arrangements') although the protocol made a 1110 distinction between different kinds of links (links are classified as 1111 upwards, downwards or sideways). ASs themselves were a 'fix' for the 1112 complexity that developed in the three tier structure of the NSFnet. 1114 Meanwhile the OSI architects, led by Lyman Chapin, were developing a 1115 much more general architecture for large scale networks. They had 1116 recognized that no one node, especially an end-system (host) could or 1117 should attempt to remember routes from "here" to "anywhere" - this 1118 sounds obvious today but was not so obvious 20 years ago. They were 1119 also considering hierarchical networks with independently 1120 administered domains - a model already well entrenched in the public 1121 switched telephone network. This led to a vision of a network with 1122 multiple independent administrative domains with an arbitrary 1123 interconnection graph and a hierarchy of routing functionality. This 1124 architecture was fairly well established by 1987 [Tsuchiya87]. The 1125 architecture initially envisaged a three level routing functionality 1126 hierarchy in which each layer had significantly different 1127 characteristics: 1129 1. *End-system to Intermediate system routing (host to router)*, in 1130 which the principal functions are discovery and redirection. 1132 2. *Intra-domain intermediate system to intermediate system routing 1133 (router to router)*, in which "best" routes between end-systems 1134 in a single administrative domain are computed and used. A 1135 single algorithm and routing protocol would be used throughout 1136 any one domain. 1138 3. *Inter-domain intermediate-system to intermediate system routing 1139 (router to router)*, in which routes between routing domains 1140 within administrative domains are computed (routing is considered 1141 separately between administrative domains and routing domains). 1143 Level 3 of this hierarchy was still somewhat fuzzy. Tsuchiya says: 1145 The last two components, Inter-Domain and Inter-Administration 1146 routing, are less clear-cut. It is not obvious what should be 1147 standardized with respect to these two components of routing. For 1148 example, for Inter-Domain routing, what can be expected from the 1149 Domains? By asking Domains to provide some kind of external 1150 behavior, we limit their autonomy. If we expect nothing of their 1151 external behavior, then routing functionality will be minimal. 1153 Across administrations, it is not known how much trust there will 1154 be. In fact, the definition of trust itself can only be 1155 determined by the two or more administrations involved. 1157 Fundamentally, the problem with Inter-Domain and Inter- 1158 Administration routing is that autonomy and mistrust are both 1159 antithetical to routing. Accomplishing either will involve a 1160 number of tradeoffs which will require more knowledge about the 1161 environments within which they will operate. 1163 Further refinement of the model occurred over the next couple of 1164 years and a more fully formed view is given by Huitema and Dabbous in 1165 1989 [Huitema90]. By this stage work on the original IS-IS link 1166 state protocol, originated by the Digital Equipment Corporation 1167 (DEC), was fairly advanced and was close to becoming a Draft 1168 International Standard. IS-IS is of course a major component of 1169 intra-domain routing today and inspired the development of the Open 1170 Shortest Path First (OSPF) family. However, Huitema and Dabbous were 1171 not able to give any indication of protocol work for Level 3. There 1172 are hints of possible use of centralized route servers. 1174 In the meantime, the NSFnet consortium and the IETF had been 1175 struggling with the rapid growth of the NSFnet. It had been clear 1176 since fairly early on that EGP was not suitable for handling the 1177 expanding network and the race was on to find a replacement. There 1178 had been some intent to include a metric in EGP to facilitate routing 1179 decisions, but no agreement could be reached on how to define the 1180 metric. The lack of trust was seen as one of the main reasons that 1181 EGP could not establish a globally acceptable routing metric: again 1182 this seems to be a clearly futile aim from this distance in time! 1183 Consequently EGP became effectively a rudimentary path-vector 1184 protocol which linked gateways with Autonomous Systems. It was 1185 totally reliant on the tree structured network to avoid routing loops 1186 and the all informed nature of EGP meant that update packets became 1187 very large. BGP version 1 [RFC1105] was standardized in 1989 but had 1188 been in development for some time before this and had already seen 1189 action in production networks prior to standardization. BGP was the 1190 first real path-vector routing protocol and was intended to relieve 1191 some of the scaling problems as well as providing policy-based 1192 routing. Routes were described as paths along a 'vector' of ASs 1193 without any associated cost metric. This way of describing routes 1194 was explicitly intended to allow detection of routing loops. It was 1195 assumed that the intra-domain routing system was loop-free with the 1196 implication that the total routing system would be loop-free if there 1197 were no loops in the AS path. Note that there were no theoretical 1198 underpinnings for this work and it traded freedom from routing loops 1199 for guaranteed convergence. 1201 Also the NSFnet was a government funded research and education 1202 network. Commercial companies which were partners in some of the 1203 projects were using the NSFnet for their research activities but it 1204 was becoming clear that these companies also needed networks for 1205 commercial traffic. NSFnet had put in place "acceptable use" 1206 policies which were intended to limit the use of the network. 1207 However there was little or no technology to support the legal 1208 framework. 1210 Practical experience, IETF IAB discussion (centred in the Internet 1211 Architecture Task Force) and the OSI theoretical work were by now 1212 coming to the same conclusions: 1213 o Networks were going to be composed out of multiple administrative 1214 domains (the federated network), 1215 o The connections between these domains would be an arbitrary graph 1216 and certainly not a tree, 1217 o The administrative domains would wish to establish distinctive, 1218 independent routing policies through the graph of Autonomous 1219 Systems, and 1220 o Administrative Domains would have a degree of distrust of each 1221 other which would mean that policies would remain opaque. 1223 These views were reflected by Susan Hares' (working for Merit 1224 Networks at that time) contribution to the Internet Architecture 1225 (INARC) workshop in 1989, summarized in the report of the workshop 1226 [INARC89]: 1228 The rich interconnectivity within the Internet causes routing 1229 problems today. However, the presenter believes the problem is 1230 not the high degree of interconnection, but the routing protocols 1231 and models upon which these protocols are based. Rich 1232 interconnectivity can provide redundancy which can help packets 1233 moving even through periods of outages. Our model of interdomain 1234 routing needs to change. The model of autonomous confederations 1235 and autonomous systems [RFC0975] no longer fits the reality of 1236 many regional networks. The ISO models of administrative domain 1237 and routing domains better fit the current Internet's routing 1238 structure. 1240 With the first NSFNET backbone, NSF assumed that the Internet 1241 would be used as a production network for research traffic. We 1242 cannot stop these networks for a month and install all new routing 1243 protocols. The Internet will need to evolve its changes to 1244 networking protocols while still continuing to serve its users. 1246 This reality colors how plans are made to change routing 1247 protocols. 1249 It is also interesting to note that the difficulties of organising a 1250 transition were recognized at this stage and have not been seriously 1251 explored or resolved since. 1253 Policies would primarily be interested in controlling which traffic 1254 should be allowed to transit a domain (to satisfy commercial 1255 constraints or acceptable use policies) thereby controlling which 1256 traffic uses the resources of the domain. The solution adopted by 1257 both the IETF and OSI was a form of distance vector hop-by-hop 1258 routing with explicit policy terms. The reasoning for this choice 1259 can be found in Breslau and Estrin's 1990 paper [Breslau90] 1260 (implicitly - because some other alternatives are given such as a 1261 link state with policy suggestion which, with hindsight, would have 1262 even greater problems than BGP on a global scale network). 1263 Traditional distance vector protocols exchanged routing information 1264 in the form of a destination and a metric. The new protocols 1265 explicitly associated policy expressions with the route by including 1266 either a list of the source ASs that are permitted to use the route 1267 described in the routing update, and/or a list of all ASs traversed 1268 along the advertised route. 1270 Parallel protocol developments were already in progress by the time 1271 this paper was published: BGP version 2 [RFC1163] in the IETF and the 1272 Inter-Domain Routing Protocol (IDRP) [ISO10747] which would be the 1273 Level 3 routing protocol for the OSI architecture. IDRP was 1274 developed under the aegis of the ANSI XS3.3 working group led by 1275 Lyman Chapin and Charles Kunzinger. The two protocols were very 1276 similar in basic design but IDRP has some extra features, some of 1277 which have been incorporated into later versions of BGP; others may 1278 yet be so and still others may be seen to be inappropriate. Breslau 1279 and Estrin summarize the design of IDRP as follows: 1281 IDRP attempts to solve the looping and convergence problems 1282 inherent in distance vector routing by including full AD 1283 [Administrative Domain - essentially the equivalent of what are 1284 now called ASs] path information in routing updates. Each routing 1285 update includes the set of ADs that must be traversed in order to 1286 reach the specified destination. In this way, routes that contain 1287 AD loops can be avoided. 1289 IDRP updates also contain additional information relevant to 1290 policy constraints. For instance, these updates can specify what 1291 other ADs are allowed to receive the information described in the 1292 update. In this way, IDRP is able to express source specific 1293 policies. The IDRP protocol also provides the structure for the 1294 addition of other types of policy related information in routing 1295 updates. For example, User Class Identifiers (UCI) could also be 1296 included as policy attributes in routing updates. 1298 Using the policy route attributes IDRP provides the framework for 1299 expressing more fine grained policy in routing decisions. 1300 However, because it uses hop-by-hop distance vector routing, it 1301 only allows a single route to each destination per-QOS to be 1302 advertised. As the policy attributes associated with routes 1303 become more fine grained, advertised routes will be applicable to 1304 fewer sources. This implies a need for multiple routes to be 1305 advertised for each destination in order to increase the 1306 probability that sources have acceptable routes available to them. 1307 This effectively replicates the routing table per forwarding 1308 entity for each QoS, UCI, source combination that might appear in 1309 a packet. Consequently, we claim that this approach does not 1310 scale well as policies become more fine grained, i.e., source or 1311 UCI specific policies. 1313 Over the next three or four years successive versions of BGP (BGP-2 1314 [RFC1163], BGP-3 [RFC1267] and BGP-4 [RFC1771]) were deployed to cope 1315 with the growing and by now commercialized Internet. From BGP-2 1316 onwards, BGP made no assumptions about an overall structure of 1317 interconnections allowing it to cope with today's dense web of 1318 interconnections between ASs. BGP version 4 was developed to handle 1319 the change from classful to classless addressing. For most of this 1320 time IDRP was being developed in parallel, and both protocols were 1321 implemented in the Merit gatedaemon routing protocol suite. During 1322 this time there was a movement within the IETF which saw BGP as a 1323 stopgap measure to be used until the more sophisticated IDRP could be 1324 adapted to run over IP instead of the OSI connectionless protocol 1325 CLNP. However, unlike its intra-domain counterpart IS-IS which has 1326 stood the test of time, and indeed proved to be more flexible than 1327 OSPF, IDRP was ultimately not adopted by the market. By the time the 1328 NSFnet backbone was decommissioned in 1995, BGP-4 was the inter- 1329 domain routing protocol of choice and OSI's star was already 1330 beginning to wane. IDRP is now little remembered. 1332 A more complete account of the capabilities of IDRP can be found in 1333 chapter 14 of David Piscitello and Lyman Chapin's book 'Open Systems 1334 Networking: TCP/IP and OSI' which is now readable on the Internet 1335 [Chapin94]. 1337 IDRP also contained quite extensive means for securing routing 1338 exchanges much of it based on X.509 certificates for each router and 1339 public/private key encryption of routing updates. 1341 Some of the capabilities of IDRP which might yet appear in a future 1342 version of BGP include the ability to manage routes with explicit QoS 1343 classes, and the concept of domain confederations (somewhat different 1344 from the confederation mechanism in today's BGP) as an extra level in 1345 the hierarchy of routing. 1347 3.3. Nimrod Requirements 1349 Nimrod as expressed by Noel Chiappa in his early document, "A New IP 1350 Routing and Addressing Architecture" [Chiappa91] and later in the 1351 NIMROD Working Group documents [RFC1753] and [RFC1992] established a 1352 number of requirements that need to be considered by any new routing 1353 architecture. The Nimrod requirements took RFC1126 as a starting 1354 point and went further. 1356 The three goals of Nimrod, quoted from [RFC1992], were as follows: 1357 1. To support a dynamic internetwork of _arbitrary size_ (our 1358 emphasis) by providing mechanisms to control the amount of 1359 routing information that must be known throughout an 1360 internetwork. 1361 2. To provide service-specific routing in the presence of multiple 1362 constraints imposed by service providers and users. 1363 3. To admit incremental deployment throughout an internetwork. 1365 It is certain that these goals should be considered requirements for 1366 any new domain-based routing architecture. 1367 o As discussed in other sections of this document the rate of growth 1368 of the amount of information needed to maintain the routing system 1369 is such that the system may not be able to scale up as the 1370 Internet expands as foreseen. And yet, as the services and 1371 constraints upon those services grow there is a need for more 1372 information to be maintained by the routing system. One of the 1373 key terms in the first requirements is 'control'. While 1374 increasing amounts of information need to be known and maintained 1375 in the Internet, the amounts and kinds of information that are 1376 distributed can be controlled. This goal should be reflected in 1377 the requirements for the future domain-based architecture. 1378 o If anything, the demand for specific services in the Internet has 1379 grown since 1996 when the Nimrod architecture was published. 1380 Additionally the kinds of constraints that service providers need 1381 to impose upon their networks and that services need to impose 1382 upon the routing have also increased. Any changes made to the 1383 network in the last half-decade have not significantly improved 1384 this situation. 1385 o The ability to incrementally deploy any new routing architecture 1386 within the Internet is still an absolute necessity. It is 1387 impossible to imagine that a new routing architecture could 1388 supplant the current architecture on a flag day. 1390 At one point in time Nimrod, with its addressing and routing 1391 architectures was seen as a candidate for IPng. History shows that 1392 it was not accepted as the IPng, having been ruled out of the 1393 selection process by the IESG in 1994 on the grounds that it was 'too 1394 much of a research effort' [RFC1752], although input for the 1395 requirements of IPng was explicitly solicited from Chiappa [RFC1753]. 1396 Instead IPv6 has been put forth as the IPng. Without entering a 1397 discussion of the relative merits of IPv6 versus Nimrod, it is 1398 apparent that IPv6, while it may solve many problems, does not solve 1399 the critical routing problems in the Internet today. In fact in some 1400 sense it exacerbates them by adding a requirement for support of two 1401 Internet protocols and their respective addressing methods. In many 1402 ways the addition of IPv6 to the mix of methods in today's Internet 1403 only points to the fact that the goals, as set forth by the Nimrod 1404 team, remain as necessary goals. 1406 There is another sense in which study of Nimrod and its architecture 1407 may be important to deriving a future domain-based routing 1408 architecture. Nimrod can be said to have two derivatives: 1409 o Multi-Protocol Label Switching (MPLS) in that it took the notion 1410 of forwarding along well known paths 1411 o Private Network-Node Interface (PNNI) in that it took the notion 1412 of abstracting topological information and using that information 1413 to create connections for traffic. 1415 It is important to note, that whilst MPLS and PNNI borrowed ideas 1416 from Nimrod, neither of them can be said to be an implementation of 1417 this architecture. 1419 3.4. PNNI 1421 The Private Network-Node Interface (PNNI) routing protocol was 1422 developed under the ATM Forum's auspices as a hierarchical route 1423 determination protocol for ATM, a connection oriented architecture. 1424 It is reputed to have developed several of its methods from a study 1425 of the Nimrod architecture. What can be gained from an analysis of 1426 what did and did not succeed in PNNI? 1428 The PNNI protocol includes the assumption that all peer groups are 1429 willing to cooperate, and that the entire network is under the same 1430 top administration. Are there limitations that stem from this 'world 1431 node' presupposition? As discussed in [RFC3221], the Internet is no 1432 longer a clean hierarchy and there is a lot of resistance to having 1433 any sort of 'ultimate authority' controlling or even brokering 1434 communication. 1436 PNNI is the first deployed example of a routing protocol that uses 1437 abstract map exchange (as opposed to distance vector or link state 1438 mechanisms) for inter-domain routing information exchange. One 1439 consequence of this is that domains need not all use the same 1440 mechanism for map creation. What were the results of this 1441 abstraction and source based route calculation mechanism? 1443 Since the authors of this document do not have experience running a 1444 PNNI network, the comments above are from a theoretical perspective. 1445 Further research on these issues based on operational experience is 1446 required. 1448 4. Recent Research Work 1450 4.1. Developments in Internet Connectivity 1452 The work commissioned from Geoff Huston by the Internet Architecture 1453 Board [RFC3221] draws a number of conclusions from analysis of BGP 1454 routing tables and routing registry databases: 1455 o The connectivity between provider ASs is becoming more like a 1456 dense mesh than the tree structure that was commonly assumed to be 1457 commonplace a couple of years ago. This has been driven by the 1458 increasing amounts charged for peering and transit traffic by 1459 global service providers. Local direct peering and Internet 1460 exchanges are becoming steadily more common as the cost of local 1461 fibre connections drops. 1462 o End user sites are increasingly resorting to multi-homing onto two 1463 or more service providers as a way of improving resiliency. This 1464 has a knock-on effect of spectacularly fast depletion of the 1465 available pool of AS numbers as end user sites require public AS 1466 numbers to become multi-homed and corresponding increase in the 1467 number of prefixes advertised in BGP. 1468 o Multi-homed sites are using advertisement of longer prefixes in 1469 BGP as a means of traffic engineering to load spread across their 1470 multiple external connections with further impact on the size of 1471 the BGP tables. 1472 o Operational practices are not uniform, and in some cases lack of 1473 knowledge or training is leading to instability and/or excessive 1474 advertisement of routes by incorrectly configured BGP speakers. 1475 o All these factors are quickly negating the advantages in limiting 1476 the expansion of BGP routing tables that were gained by the 1477 introduction of CIDR and consequent prefix aggregation in BGP. It 1478 is also now impossible for IPv6 to realize the world view in which 1479 the default free zone would be limited to perhaps 10,000 prefixes. 1480 o The typical 'width' of the Internet in AS hops is now around five, 1481 and much less in many cases. 1483 These conclusions have a considerable impact on the requirements for 1484 the future domain-based routing architecture: 1486 o Topological hierarchy (e.g. mandating a tree structured 1487 connectivity) cannot be relied upon to deliver scalability of a 1488 large Internet routing system 1489 o Aggregation cannot be relied upon to constrain the size of routing 1490 tables for an all-informed routing system 1492 4.2. DARPA NewArch Project 1494 DARPA funded a project to think about a new architecture for future 1495 generation Internet, called NewArch (). 1496 Work started in the first half of 2000 and the main project finished 1497 in 2003 [NewArch03]. 1499 The main development is to conclude that as the Internet becomes 1500 mainstream infrastructure, fewer and fewer of the requirements are 1501 truly global but may apply with different force or not at all in 1502 certain parts of the network. This (it is claimed) makes the 1503 compilation of a single, ordered list of requirements deeply 1504 problematic. Instead we may have to produce multiple requirement 1505 sets with support for differing requirement importance at different 1506 times and in different places. This 'meta-requirement' significantly 1507 impacts architectural design. 1509 Potential new technical requirements identified so far include: 1510 o Commercial environment concerns such as richer inter-provider 1511 policy controls and support for a variety of payment models 1512 o Trustworthiness 1513 o Ubiquitous mobility 1514 o Policy driven self-organisation ('deep auto configuration') 1515 o Extreme short-timescale resource variability 1516 o Capacity allocation mechanisms 1517 o Speed, propagation delay and delay/bandwidth product issues 1519 Non-technical or political 'requirements' include: 1520 o Legal and Policy drivers such as 1521 * Privacy and free/anonymous speech 1522 * Intellectual property concerns 1523 * Encryption export controls 1524 * Law enforcement surveillance regulations 1525 * Charging and taxation issues 1526 o Reconciling national variations and consistent operation in a 1527 world wide infrastructure 1529 The conclusions of the work are now summarized in the final report 1530 [NewArch03]. 1532 4.2.1. Defending the End-to-End Principle 1534 One of the participants in DARPA NewArch work (Dave Clark) with one 1535 of his associates has also published a very interesting paper 1536 analyzing the impact of some of the new requirements identified in 1537 NewArch (see Section 4.2) on the end-to-end principle that has guided 1538 the development of the Internet to date [Blumenthal01]. Their 1539 primary conclusion is that the loss of trust between the users at the 1540 ends of end to end has the most fundamental effect on the Internet. 1541 This is clear in the context of the routing system, where operators 1542 are unwilling to reveal the inner workings of their networks for 1543 commercial reasons. Similarly, trusted third parties and their 1544 avatars (mainly mid-boxes of one sort or another) have a major impact 1545 on the end-to-end principles and the routing mechanisms that went 1546 with them. Overall, the end to end principles should be defended so 1547 far as is possible - some changes are already too deeply embedded to 1548 make it possible to go back to full trust and openness - at least 1549 partly as a means of staving off the day when the network will ossify 1550 into an unchangeable form and function (much as the telephone network 1551 has done). The hope is that by that time a new Internet will appear 1552 to offer a context for unfettered innovation. 1554 5. Existing problems of BGP and the current Inter-/Intra-Domain 1555 Architecture 1557 Although most of the people who have to work with BGP today believe 1558 it to be a useful, working protocol, discussions have brought to 1559 light a number of areas where BGP or the relationship between BGP and 1560 the intra-domain routing protocols in use today could be improved. 1561 BGP-4 has been and continues to be extended since it was originally 1562 introduced in [RFC1771] and the protocol as deployed has been 1563 documented in [RFC4271]. This section is, to a large extent, a wish 1564 list for the future domain-based routing architecture based on those 1565 areas where BGP is seen to be lacking, rather than simply a list of 1566 problems with BGP. The shortcomings of today's inter-domain routing 1567 system have also been extensively surveyed in 'Architectural 1568 Requirements for Inter-Domain Routing in the Internet' [RFC3221], 1569 particularly with respect to its stability and the problems produced 1570 by explosions in the size of the Internet. 1572 5.1. BGP and Auto-aggregation 1574 The initial stability followed by linear growth rates of the number 1575 of routing objects (prefixes) that was achieved by the introduction 1576 of CIDR around 1994, has now been once again been replaced by near- 1577 exponential growth of number of routing objects. The granularity of 1578 many of the objects advertised in the default free zone is very small 1579 (prefix length of 22 or longer): This granularity appears to be a by- 1580 product of attempts to perform precision traffic engineering related 1581 to increasing levels of multi-homing. At present there is no 1582 mechanism in BGP that would allow an AS to aggregate such prefixes 1583 without advance knowledge of their existence, even if it was possible 1584 to deduce automatically that they could be aggregated. Achieving 1585 satisfactory auto-aggregation would also significantly reduce the 1586 non-locality problems associated with instability in peripheral ASs. 1588 On the other hand, it may be that alterations to the connectivity of 1589 the net as described in [RFC3221] and Section 2.5.1 may limit the 1590 usefulness of auto-aggregation. 1592 5.2. Convergence and Recovery Issues 1594 BGP today is a stable protocol under most circumstances but this has 1595 been achieved at the expense of making the convergence time of the 1596 inter-domain routing system very slow under some conditions. This 1597 has a detrimental effect on the recovery of the network from 1598 failures. 1600 The timers that control the behavior of BGP are typically set to 1601 values in the region of several tens of seconds to a few minutes, 1602 which constrains the responsiveness of BGP to failure conditions. 1604 In the early days of deployment of BGP, poor network stability and 1605 router software problems lead to storms of withdrawals closely 1606 followed by re-advertisements of many prefices. To control the load 1607 on routing software imposed by these "route flaps", route flap 1608 damping was introduced into BGP. Most operators have now implemented 1609 a degree of route flap damping in their deployments of BGP. This 1610 restricts the number of times that the routing tables will be rebuilt 1611 even if a route is going up and down very frequently. Unfortunately, 1612 route flap damping responds to multiple flaps by increasing the route 1613 suppression time exponentially, which can result in some parts of the 1614 Internet being unreachable for hours at a time. 1616 There is evidence ([RFC3221] and measurements by some of the Sub- 1617 group B members[Jiang02]) that in today's network route flap is 1618 disproportionately associated with the fine grain prefices (length 22 1619 or longer) associated with traffic engineering at the periphery of 1620 the network. Auto-aggregation as previously discussed would tend to 1621 mask such instability and prevent it being propagated across the 1622 whole network. Another question that needs to be studied is the 1623 continuing need for an architecture that requires global convergence. 1624 Some of our studies (unpublished) show that, in some localities at 1625 least, the network never actually reaches stability; i.e., it never 1626 really globally converges. Can a global, and beyond, network be 1627 designed with the requirement of global convergence? 1629 5.3. Non-locality of Effects of Instability and Misconfiguration 1631 There have been a number of instances, some of which are well 1632 documented, of a mistake in BGP configuration in a single peripheral 1633 AS propagating across the whole Internet and resulting in misrouting 1634 of most of the traffic in the Internet. 1636 Similarly, a single route flap in a single peripheral AS can require 1637 route table recalculation across the entire Internet. 1639 This non-locality of effects is highly undesirable, and it would be a 1640 considerable improvement if such effects were naturally limited to a 1641 small area of the network around the problem. This is another 1642 argument for an architecture that does not require global 1643 convergence. 1645 5.4. Multihoming Issues 1647 As discussed previously, the increasing use of multi-homing as a 1648 robustness technique by peripheral networks requires that multiple 1649 routes have to be advertised for such domains. These routes must not 1650 be aggregated close in to the multi-homed domain as this would defeat 1651 the traffic engineering implied by multi-homing and currently cannot 1652 be aggregated further away from the multi-homed domain due to the 1653 lack of auto-aggregation capabilities. Consequentially the default 1654 free zone routing table is growing exponentially, as it was before 1655 CIDR. 1657 The longest prefix match routing technique introduced by CIDR, and 1658 implemented in BGP-4, when combined with provider address allocation 1659 is an obstacle to effective multi-homing if load sharing across the 1660 multiple links is required. If an AS has been allocated its 1661 addresses from an upstream provider, the upstream provider can 1662 aggregate those addresses with those of other customers and need only 1663 advertise a single prefix for a range of customers. But, if the 1664 customer AS is also connected to another provider, the second 1665 provider is not able to aggregate the customer addresses because they 1666 are not taken from his allocation, and will therefore have to 1667 announce a more specific route to the customer AS. The longest match 1668 rule will then direct all traffic through the second provider, which 1669 is not as required. 1671 Example: 1673 \ / 1674 AS1 AS2 1675 \ / 1676 AS3 1678 Figure 1: Address Aggregation 1680 In Figure 1 AS3 has received its addresses from AS1, which means AS1 1681 can aggregate. But if AS3 wants its traffic to be seen equally both 1682 ways, AS3 is forced to announce both the aggregate and the more 1683 specific route to AS2. 1685 This problem has induced many ASs to apply for their own address 1686 allocation even though they could have been allocated from an 1687 upstream provider further exacerbating the default free zone route 1688 table size explosion. This problem also interferes with the desire 1689 of many providers in the default free zone to route only prefixes 1690 that are equal to or shorter than 20 or 19 bits. 1692 Note that some problems which are referred to as multihoming issues 1693 are not, and should not be, solvable through the routing system 1694 (e.g., where a TCP load distributor is needed), and multihoming is 1695 not a panacea for the general problem of robustness in a routing 1696 system [I-D.berkowitz-multireq]. 1698 Editors' Note: A more recent analysis of multihoming can be found 1699 in [RFC4116]. 1701 5.5. AS-number exhaustion 1703 The domain identifier or AS-number is a 16-bit number. When this 1704 paper was originally written in 2001, allocation of AS-numbers was 1705 increasing 51% a year [RFC3221] and exhaustion by 2005 was predicted. 1706 According to some recent work again by Huston [Huston05], the rate of 1707 increase dropped off after the business downturn but as of July 2005, 1708 well over half the available AS numbers (39000 out of 64510) had been 1709 allocated by IANA and around 20000 were visible in the global BGP 1710 routing tables. A year later these figures had grown to 42000 (April 1711 2006) and 23000 (August 2006) respectively and the rate of allocation 1712 is currently about 3500 per year. Depending on the curve fitting 1713 model used to predict when exhaustion will occur, the pool will run 1714 out somewhere between 2010 and 2013. There appear to be other 1715 factors at work in this rate of increase beyond an increase in the 1716 number of ISPs in business, although there is a fair degree of 1717 correlation between these numbers. AS numbers are now used for a 1718 number of purposes beyond that of identifying large routing domains: 1719 multihomed sites acquire an AS number in order to express routing 1720 preferences to their various providers and AS numbers are used part 1721 of the addressing mechanism for MPLS/BGP-based virtual private 1722 networks (VPNs) [RFC4364]. The IETF has had a proposal under 1723 development for over four years to increase the available range of 1724 AS-numbers to 32 bits [RFC4893]. Much of the slowness in development 1725 is due to the deployment challenge during transition. Because of the 1726 difficulties of transition, deployment needs to start well in advance 1727 of actual exhaustion so that the network as a whole is ready for the 1728 new capability when it is needed. This implies that standardisation 1729 needs to be complete and implementations available at least well in 1730 advance of expected exhaustion so that deployment of upgrades that 1731 can handle the longer AS numbers should be starting around 2008 to 1732 give a reasonable expectation that the change has been rolled out 1733 across a large fraction of the Internet by the time exhaustion 1734 occurs. 1735 Editors' Note: The RIRs are planning to move to assignment of the 1736 longer AS numbers by default on 1 January 2009, but there are 1737 concerns that significant numbers of routers will not have been 1738 upgraded by then. 1740 5.6. Partitioned ASs 1742 Tricks with discontinuous ASs are used by operators, for example, to 1743 implement anycast. Discontinuous ASs may also come into being by 1744 chance if a multi-homed domain becomes partitioned as a result of a 1745 fault and part of the domain can access the Internet through each 1746 connection. It may be desirable to make support for this kind of 1747 situation more transparent than it is at present. 1749 5.7. Load Sharing 1751 Load splitting or sharing was not a goal of the original designers of 1752 BGP and it is now a problem for today's network designers and 1753 managers. Trying to fool BGP into load sharing between several links 1754 is a constantly recurring exercise for most operators today. 1756 5.8. Hold down issues 1758 As with the interval between 'hello' messages in OSPF, the typical 1759 size and defined granularity (seconds to tens of seconds) of the 1760 'keep-alive' time negotiated at start-up for each BGP connection 1761 constrains the responsiveness of BGP to link failures. 1763 The recommended values and the available lower limit for this timer 1764 were set to limit the overhead caused by keep-alive messages when 1765 link bandwidths were typically much lower than today. Analysis and 1766 experiment ([I-D.alaettinoglu-isis-convergence], [I-D.sandiick-flip] 1767 and [RFC4204]) indicate that faster links could sustain a much higher 1768 rate of keep-alive messages without significantly impacting normal 1769 data traffic. This would improve responsiveness to link and node 1770 failures but with a corresponding increase in the risk of 1771 instability, if the error characteristics of the link are not taken 1772 properly into account when setting the keep-alive interval. 1774 Editors' Note: A 'fast' liveness protocol has been standardized as 1775 [I-D.ietf-bfd-base]. 1777 An additional problem with the hold-down mechanism in BGP is the 1778 amount of information that has to be exchanged to re-establish the 1779 database of route advertisements on each side of the link when it is 1780 re-established after a failure. Currently any failure, however brief 1781 forces a full exchange which could perhaps be constrained by 1782 retaining some state across limited time failures and using revision 1783 control, transaction and replication techniques to resynchronise the 1784 databases. Various techniques have been implemented to try to reduce 1785 this problem but they have not yet been standardised. 1787 5.9. Interaction between Inter-Domain Routing and Intra-Domain Routing 1789 Today, many operators' backbone routers run both I-BGP and an intra- 1790 domain protocol to maintain the routes that reach between the borders 1791 of the domain. Exporting routes from BGP into the intra-domain 1792 protocol in use and bringing them back up to BGP is not recommended 1793 [RFC2791], but it is still necessary for all backbone routers to run 1794 both protocols. BGP is used to find the egress point and intra- 1795 domain protocol to find the path (next hop router) to the egress 1796 point across the domain. This is not only a management problem but 1797 may also create other problems: 1798 o BGP is a path vector protocol (i.e., a protocol that uses distance 1799 metrics possibly overridden by policy metrics), whereas most 1800 intra-domain protocols are link state protocols. As such BGP is 1801 not optimised for convergence speed although distance vector 1802 algorithms generally require less processing power. Incidentally, 1803 more efficient distance vector algorithms are available such as 1804 [Xu97]. 1805 o The metrics used in BGP and the intra-domain protocol are rarely 1806 comparable or combinable. Whilst there are arguments that the 1807 optimizations inside a domain may be different from those for end- 1808 to-end paths, there are occasions, such as calculating the 1809 'topologically nearest' server when computable or combinable 1810 metrics would be of assistance. 1812 o The policies that can be implemented using BGP are designed for 1813 control of traffic exchange between operators, not for controlling 1814 paths within a domain. Policies for BGP are most conveniently 1815 expressed in Routing Policy Support Language (RPSL) [RFC2622] and 1816 this could be extended if thought desirable to include additional 1817 policy information. 1818 o If the NEXT HOP destination for a set of BGP routes becomes 1819 inaccessible because of intra-domain protocol problems, the routes 1820 using the vanished next hop have to be invalidated at the next 1821 available UPDATE. Subsequently, if the next hop route reappears, 1822 this would normally lead to the BGP speaker requesting a full 1823 table from its neighbour(s). Current implementations may attempt 1824 to circumvent the effects of intra-domain protocol route flap by 1825 caching the invalid routes for a period in case the next hop is 1826 restored through the 'graceful restart' mechanism. 1828 * Editors' Note: This was standardized as [RFC4724]. 1830 o Synchronization between intra-domain and inter-domain routing 1831 information is a problem as long as we use different protocols for 1832 intra-domain and inter-domain routing, which will most probably be 1833 the case even in the future because of the differing requirements 1834 in the two situations. Some sort of synchronization between those 1835 two protocols would be useful. In the RFC 'IS-IS Transient 1836 Blackhole Avoidance' [RFC3277], the intra-domain protocol side of 1837 the story is covered (there is an equivalent discussion for OSPF). 1838 o Synchronizing in BGP means waiting for the intra-domain protocol 1839 to know about the same networks as the inter-domain protocol, 1840 which can take a significant period of time and slows down the 1841 convergence of BGP by adding the intra-domain protocol convergence 1842 time into each cycle. In general operators no longer attempt full 1843 synchronization in order to avoid this problem (in general, 1844 redistributing the entire BGP routing feed into the local intra- 1845 domain protocol is unnecessary and undesirable but where a domain 1846 has multiple exits to peers and other non-customer networks, 1847 changes in BGP routing that affect the exit taken by traffic 1848 require corresponding re-routing in the intra-domain routing). 1850 5.10. Policy Issues 1852 There are several classes of issues with current BGP policy: 1853 o Policy is installed in an ad-hoc manner in each autonomous system. 1854 There isn't a method for ensuring that the policy installed in one 1855 router is coherent with policies installed in other routers. 1856 o As described in Griffin [Griffin99] and in McPherson [RFC3345] it 1857 is possible to create policies for ASs, and instantiate them in 1858 routers, that will cause BGP to fail to converge in certain types 1859 of topology 1861 o There is no available network model for describing policy in a 1862 coherent manner. 1864 Policy management is extremely complex and mostly done without the 1865 aid of any automated procedures. The extreme complexity means that a 1866 highly qualified specialist is required for policy management of 1867 border routers. The training of these specialists is quite lengthy 1868 and needs to involve long periods of hands-on experience. There is, 1869 therefore, a shortage of qualified staff for installing and 1870 maintaining the routing policies. Because of the overall complexity 1871 of BGP, policy management tends to be only a relatively small topic 1872 within a complete BGP training course and specialised policy 1873 management training courses are not generally available. 1875 5.11. Security Issues 1877 While many of the issues with BGP security have been traced either to 1878 implementation issues or to operational issues, BGP is vulnerable to 1879 Distributed Denial of Service (DDoS) attacks. Additionally routers 1880 can be used as unwitting forwarders in DDoS attacks on other systems. 1882 Though DDoS attacks can be fought in a variety of ways, mostly using 1883 filtering methods, it takes constant vigilance. There is nothing in 1884 the current architecture or in the protocols that serves to protect 1885 the forwarders from these attacks. 1887 Editors' Note: Since the original draft was written, the issue of 1888 inter-domain routing security has been studied in much greater 1889 depth. The rpsec working group has gone into the security issues 1890 in great detail [RFC4593] and readers should refer to that work to 1891 understand the security issues. 1893 5.12. Support of MPLS and VPNS 1895 Recently BGP has been modified to function as a signaling protocol 1896 for MPLS and for VPNs [RFC4364]. Some people see this over-loading 1897 of the BGP protocol as a boon whilst others see it as a problem. 1898 While it was certainly convenient as a vehicle for vendors to deliver 1899 extra functionality for to their products, it has exacerbated some of 1900 the performance and complexity issues of BGP. Two important problems 1901 are, the additional state that must be retained and refreshed to 1902 support VPN (Virtual Private Network) tunnels and that BGP does not 1903 provide end-to-end notification making it difficult to confirm that 1904 all necessary state has been installed or updated. 1906 It is an open question whether VPN signaling protocols should remain 1907 separate from the route determination protocols. 1909 5.13. IPv4 / IPv6 Ships in the Night 1911 The fact that service providers need to maintain two completely 1912 separate networks, one for IPv4 and one for IPv6, has been a real 1913 hindrance to the introduction of IPv6. When IPv6 does get widely 1914 deployed it will do so without causing the disappearance of IPv4. 1915 This means that unless something is done, service providers would 1916 need to maintain the two networks in perpetuity (at least on the 1917 foreshortened timescale which the Internet world uses). 1919 It is possible to use a single set of BGP speakers with multiprotocol 1920 extensions [RFC4760] to exchange information about both IPv4 and IPv6 1921 routes between domains, but the use of TCP as the transport protocol 1922 for the information exchange results in an asymmetry when choosing to 1923 use one of TCP over IPv4 or TCP over IPv6. Successful information 1924 exchange confirms one of IPv4 or IPv6 reachability between the 1925 speakers but not the other, making it possible that reachability is 1926 being advertised for a protocol for which it is not present. 1928 Also, current implementations do not allow a route to be advertised 1929 for both IPv4 and IPv6 in the same UPDATE message, because it is not 1930 possible to explicitly link the reachability information for an 1931 address family to the corresponding next hop information. This could 1932 be improved, but currently results in independent UPDATEs being 1933 exchanged for each address family. 1935 5.14. Existing Tools to Support Effective Deployment of Inter-Domain 1936 Routing 1938 The tools available to network operators to assist in configuring and 1939 maintaining effective inter-domain routing in line with their defined 1940 policies are limited, and almost entirely passive. 1942 o There are no tools to facilitate the planning of the routing of a 1943 domain (either intra- or inter-domain); there are a limited number 1944 of display tools that will visualize the routing once it has been 1945 configured. 1946 o There are no tools to assist in converting business policy 1947 specifications into the Routing Policy Specification Language 1948 (RPSL) language (see Section 5.14.1); there are limited tools to 1949 convert the RPSL into BGP commands and to check, post-facto, that 1950 the proposed policies are consistent with the policies in adjacent 1951 domains (always provided that these have been revealed and 1952 accurately documented). 1953 o There are no tools to monitor BGP route changes in real time and 1954 warn the operator about policy inconsistencies and/or 1955 instabilities. 1957 The following section summarises the tools that are available to 1958 assist with the use of RPSL. Note they are all batch mode tools used 1959 off-line from a real network. These tools will provide checks for 1960 skilled inter-domain routing configurers but limited assistance for 1961 the novice. 1963 5.14.1. Routing Policy Specification Language RPSL (RFC 2622, 2650) and 1964 RIPE NCC Database (RIPE 157) 1966 Routing Policy Specification Language (RPSL) [RFC2622] enables a 1967 network operator to describe routes, routers and autonomous systems 1968 ASs that are connected to the local AS. 1970 Using the RPSL language (see [RFC2650]) a distributed database is 1971 created to describe routing policies in the Internet as described by 1972 each AS independently. The database can be used to check the 1973 consistency of routing policies stored in the database. 1975 Tools exist ([IRRToolSet]) that can use the database to (among other 1976 things) 1977 o Flag when two neighboring network operators specify conflicting or 1978 inconsistent routing information exchanges with each other and 1979 also detect global inconsistencies where possible; 1980 o Extract all AS-paths between two networks that are allowed by 1981 routing policy from the routing policy database; display the 1982 connectivity a given network has according to current policies. 1984 The database queries enable a partial static solution to the 1985 convergence problem. They analyze routing policies of a very limited 1986 part of Internet and verify that they do not contain conflicts that 1987 could lead to protocol divergence. The static analysis of 1988 convergence of the entire system has exponential time complexity, so 1989 approximation algorithms would have to be used. 1991 The toolset also allows router configurations to be generated from 1992 RPSL specifications. 1994 Editors' Note: The "Internet Routing Registry Toolset" was 1995 originally developed by the University of Southern California's 1996 Information Sciences Institute (ISI) between 1997 and 2001 as the 1997 "Routing Arbiter ToolSet" (RAToolSet) project. The toolset is no 1998 longer developed by ISI but is used worldwide, so after a period 1999 of improvement by RIPE NCC it has now been transferred to the 2000 Internet Systems Consortium (ISC) for ongoing maintenance as a 2001 public resource. 2003 6. Security Considerations 2005 As this is an informational draft on the history of requirements in 2006 IDR and on the problems facing the current Internet IDR architecture, 2007 it does not as such create any security problems. On the other hand, 2008 some of the problems with today's Internet routing architecture do 2009 create security problems and these have been discussed in the text 2010 above. 2012 7. IANA Considerations 2014 This document does not request any actions by IANA. 2016 RFC Editor: Please remove this section before publication. 2018 8. Acknowledgments 2020 The draft is derived from work originally produced by Babylon. 2021 Babylon was a loose association of individuals from academia, service 2022 providers and vendors whose goal was to discuss issues in Internet 2023 routing with the intention of finding solutions for those problems. 2025 The individual members who contributed materially to this draft are: 2026 Anders Bergsten, Howard Berkowitz, Malin Carlzon, Lenka Carr 2027 Motyckova, Elwyn Davies, Avri Doria, Pierre Fransson, Yong Jiang, 2028 Dmitri Krioukov, Tove Madsen, Olle Pers, and Olov Schelen. 2030 Thanks also go to the members of Babylon and others who did 2031 substantial reviews of this material. Specifically we would like to 2032 acknowledge the helpful comments and suggestions of the following 2033 individuals: Loa Andersson, Tomas Ahlstrom, Erik Aman, Thomas 2034 Eriksson, Niklas Borg, Nigel Bragg, Thomas Chmara, Krister Edlund, 2035 Owe Grafford, Susan Hares, Torbjorn Lundberg, David McGrew, Jasminko 2036 Mulahusic, Florian-Daniel Otel, Bernhard Stockman, Tom Worster, and 2037 Roberto Zamparo. 2039 In addition, the authors are indebted to the folks who wrote all the 2040 references we have consulted in putting this paper together. This 2041 includes not only the references explicitly listed below, but also 2042 those who contributed to the mailing lists we have been participating 2043 in for years. 2045 Finally, it is the editors who are responsible for any lack of 2046 clarity, any errors, glaring omissions or misunderstandings. 2048 9. Informative References 2050 [Blumenthal01] 2051 Blumenthal, M. and D. Clark, "Rethinking the design of the 2052 Internet: The end to end arguments vs", the brave new 2053 world , May 2001, 2054 . 2056 [Breslau90] 2057 Breslau, L. and D. Estrin, "An Architecture for Network- 2058 Layer Routing in OSI", Proceedings of the ACM symposium on 2059 Communications architectures & protocols , 1990. 2061 [Chapin94] 2062 Piscitello, D. and A. Chapin, "Open Systems Networking: 2063 TCP/IP & OSI", Addison-Wesley Copyright assigned to 2064 authors, 1994, . 2066 [Chiappa91] 2067 Chiappa, N., "A New IP Routing and Addressing 2068 Architecture", draft-chiappa-routing-01.txt (work in 2069 progress), 1991, 2070 . 2072 [Griffin99] 2073 Griffin, T. and G. Wilfong, "An Analysis of BGP 2074 Convergence Properties", Association for Computing 2075 Machinery Proceedings of SIGCOMM '99, 1999. 2077 [Huitema90] 2078 Huitema, C. and W. Dabbous, "Routeing protocols 2079 development in the OSI architecture", Proceedings of 2080 ISCIS V Turkey, 1990. 2082 [Huston05] 2083 Huston, G., "Exploring Autonomous System Numbers", The ISP 2084 Column , August 2005, 2085 . 2087 [I-D.alaettinoglu-isis-convergence] 2088 Alaettinoglu, C., Jacobson, V., and H. Yu, "Towards Milli- 2089 Second IGP Convergence", 2090 draft-alaettinoglu-isis-convergence-00 (work in progress), 2091 Nov 2000. 2093 [I-D.berkowitz-multireq] 2094 Berkowitz, H. and D. Krioukov, "To Be Multihomed: 2095 Requirements and Definitions", draft-berkowitz-multireq-02 2096 (work in progress), 2001. 2098 [I-D.ietf-bfd-base] 2099 Katz, D. and D. Ward, "Bidirectional Forwarding 2100 Detection", draft-ietf-bfd-base-09 (work in progress), 2101 February 2009. 2103 [I-D.irtf-routing-reqs] 2104 Doria, A., Davies, E., and F. Kastenholz, "A Set of 2105 Possible Requirements for a Future Routing Architecture", 2106 draft-irtf-routing-reqs-10 (work in progress), 2107 January 2009. 2109 [I-D.sandiick-flip] 2110 Sandick, H., Squire, M., Cain, B., Duncan, I., and B. 2111 Haberman, "Fast LIveness Protocol (FLIP)", 2112 draft-sandiick-flip-00 (work in progress), Feb 2000. 2114 [INARC89] Mills, D., Ed. and M. Davis, Ed., "Internet Architecture 2115 Workshop: Future of the Internet System Architecture and 2116 TCP/IP Protocols - Report", Internet Architecture Task 2117 Force INARC, 1990, . 2120 [IRRToolSet] 2121 Internet Systems Consortium, "Internet Routing Registry 2122 Toolset Project", IRR Tool Set Website, 2006, 2123 . 2125 [ISO10747] 2126 ISO/IEC, "Protocol for Exchange of Inter-Domain Routeing 2127 Information among Intermediate Systems to support 2128 Forwarding of ISO 8473 PDUs", International Standard 2129 10747 , 1993. 2131 [Jiang02] Jiang, Y., Doria, A., Olsson, D., and F. Pettersson, 2132 "Inter-domain Routing Stability Measurement", , 2002, . 2135 [Labovitz02] 2136 Labovitz, C., Ahuja, A., Farnam, J., and A. Bose, 2137 "Experimental Measurement of Delayed Convergence", NANOG , 2138 2002. 2140 [NewArch03] 2141 Clark, D., Sollins, K., Wroclawski, J., Katabi, D., Kulik, 2142 J., Yang, X., Braden, R., Faber, T., Falk, A., Pingali, 2143 V., Handley, M., and N. Chiappa, "New Arch: Future 2144 Generation Internet Architecture", December 2003, 2145 . 2147 [RFC0904] Mills, D., "Exterior Gateway Protocol formal 2148 specification", RFC 904, April 1984. 2150 [RFC0975] Mills, D., "Autonomous confederations", RFC 975, 2151 February 1986. 2153 [RFC1105] Lougheed, K. and J. Rekhter, "Border Gateway Protocol 2154 (BGP)", RFC 1105, June 1989. 2156 [RFC1126] Little, M., "Goals and functional requirements for inter- 2157 autonomous system routing", RFC 1126, October 1989. 2159 [RFC1163] Lougheed, K. and Y. Rekhter, "Border Gateway Protocol 2160 (BGP)", RFC 1163, June 1990. 2162 [RFC1267] Lougheed, K. and Y. Rekhter, "Border Gateway Protocol 3 2163 (BGP-3)", RFC 1267, October 1991. 2165 [RFC1752] Bradner, S. and A. Mankin, "The Recommendation for the IP 2166 Next Generation Protocol", RFC 1752, January 1995. 2168 [RFC1753] Chiappa, J., "IPng Technical Requirements Of the Nimrod 2169 Routing and Addressing Architecture", RFC 1753, 2170 December 1994. 2172 [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 2173 (BGP-4)", RFC 1771, March 1995. 2175 [RFC1992] Castineyra, I., Chiappa, N., and M. Steenstrup, "The 2176 Nimrod Routing Architecture", RFC 1992, August 1996. 2178 [RFC2362] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, 2179 S., Handley, M., and V. Jacobson, "Protocol Independent 2180 Multicast-Sparse Mode (PIM-SM): Protocol Specification", 2181 RFC 2362, June 1998. 2183 [RFC2622] Alaettinoglu, C., Villamizar, C., Gerich, E., Kessens, D., 2184 Meyer, D., Bates, T., Karrenberg, D., and M. Terpstra, 2185 "Routing Policy Specification Language (RPSL)", RFC 2622, 2186 June 1999. 2188 [RFC2650] Meyer, D., Schmitz, J., Orange, C., Prior, M., and C. 2189 Alaettinoglu, "Using RPSL in Practice", RFC 2650, 2190 August 1999. 2192 [RFC2791] Yu, J., "Scalable Routing Design Principles", RFC 2791, 2193 July 2000. 2195 [RFC3221] Huston, G., "Commentary on Inter-Domain Routing in the 2196 Internet", RFC 3221, December 2001. 2198 [RFC3277] McPherson, D., "Intermediate System to Intermediate System 2199 (IS-IS) Transient Blackhole Avoidance", RFC 3277, 2200 April 2002. 2202 [RFC3345] McPherson, D., Gill, V., Walton, D., and A. Retana, 2203 "Border Gateway Protocol (BGP) Persistent Route 2204 Oscillation Condition", RFC 3345, August 2002. 2206 [RFC3618] Fenner, B. and D. Meyer, "Multicast Source Discovery 2207 Protocol (MSDP)", RFC 3618, October 2003. 2209 [RFC3765] Huston, G., "NOPEER Community for Border Gateway Protocol 2210 (BGP) Route Scope Control", RFC 3765, April 2004. 2212 [RFC3913] Thaler, D., "Border Gateway Multicast Protocol (BGMP): 2213 Protocol Specification", RFC 3913, September 2004. 2215 [RFC4116] Abley, J., Lindqvist, K., Davies, E., Black, B., and V. 2216 Gill, "IPv4 Multihoming Practices and Limitations", 2217 RFC 4116, July 2005. 2219 [RFC4204] Lang, J., "Link Management Protocol (LMP)", RFC 4204, 2220 October 2005. 2222 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 2223 Protocol 4 (BGP-4)", RFC 4271, January 2006. 2225 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 2226 Networks (VPNs)", RFC 4364, February 2006. 2228 [RFC4593] Barbir, A., Murphy, S., and Y. Yang, "Generic Threats to 2229 Routing Protocols", RFC 4593, October 2006. 2231 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 2232 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 2233 Protocol Specification (Revised)", RFC 4601, August 2006. 2235 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 2236 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 2237 January 2007. 2239 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 2240 "Multiprotocol Extensions for BGP-4", RFC 4760, 2241 January 2007. 2243 [RFC4893] Vohra, Q. and E. Chen, "BGP Support for Four-octet AS 2244 Number Space", RFC 4893, May 2007. 2246 [Tsuchiya87] 2247 Tsuchiya, P., "An Architecture for Network-Layer Routing 2248 in OSI", Proceedings of the ACM workshop on Frontiers in 2249 computer communications technology , 1987. 2251 [Xu97] Xu, Z., Dai, S., and J. Garcia-Luna-Aceves, "A More 2252 Efficient Distance Vector Routing Algorithm", Proc IEEE 2253 MILCOM 97, Monterey, California, Nov 1997, . 2257 Authors' Addresses 2259 Elwyn B. Davies 2260 Folly Consulting 2261 Soham, Cambs 2262 UK 2264 Phone: +44 7889 488 335 2265 Email: elwynd@dial.pipex.com 2267 Avri Doria 2268 LTU 2269 Lulea, 971 87 2270 Sweden 2272 Phone: +1 401 663 5024 2273 Email: avri@acm.org