idnits 2.17.1 draft-haskin-bgp-idrp-mesh-routing-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 111 instances of lines with control characters in the document. == There are 7 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 701 has weird spacing: '...mmended value...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 1995) is 10628 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'BGP4' -- Possible downref: Non-RFC (?) normative reference: ref. 'IDRP' Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Dimitry Haskin 2 Internet Draft Bay Networks, Inc. 3 Expires September 1995 March 1995 4 draft-haskin-bgp-idrp-mesh-routing-00.txt 6 A BGP/IDRP Route Server alternative to a full mesh routing 8 Status of this Memo 10 This document is an Internet Draft. Internet Drafts are working 11 documents of the Internet Engineering Task Force (IETF), its Areas, 12 and its Working Groups. Note that other groups may also distribute 13 working documents as Internet Drafts). 15 Internet Drafts are draft documents valid for a maximum of six 16 months. Internet Drafts may be updated, replaced, or obsoleted by 17 other documents at any time. It is not appropriate to use Internet 18 Drafts as reference material or to cite them other than as a "working 19 draft" or "work in progress." 21 Please check the I-D abstract listing contained in each Internet 22 Draft directory to learn the current status of this or any other 23 Internet Draft. 25 Abstract 27 This document describes the use and detailed design of Route Servers 28 for dissemination of routing information among BGP/IDRP speaking 29 routers. 31 The intention of the proposed technique is to reduce overhead and 32 management complexity of maintaining numerous direct BGP/IDRP 33 sessions which otherwise might be required or desired among routers 34 within a single routing domain as well as among routers in different 35 domains that are connected to a common switched fabric (e.g. an ATM 36 cloud). 38 1. Overview 40 Current deployments of Exterior Routing protocols, such as the Border 41 Gateway Protocol [BGP4] and the adaptation of the ISO Inter-Domain 42 Routing Protocol [IDRP], require that all BGP/IDRP routers, which 43 participate in inter-domain routing (border routers) and belong to the 44 same routing domain, establish a full mesh connectivity with each other 45 for purpose of exchanging routing information acquired from other 46 routing domains. In large routing domains the number of intra-domain 47 connections that needs to be maintained by each border route can be 48 significant. 50 In addition, it may be desired for a border router to establish routing 51 sessions with all border routers in other domains which are reachable 52 via a shared communication media. We refer to routers that are directly 53 reachable via a shared media as adjacent routers. Such direct peering 54 allows a router to acquire "first hand" information about destinations 55 which are directly reachable through adjacent routers and select the 56 optimum direct paths to these destinations. Establishment of BGP/IDRP 57 sessions among all adjacent border routers would result in a full mesh 58 routing connectivity. Unfortunately for a switched media as ATM, SMDS 59 or Frame Relay network which may inter-connect a large number of 60 routers, due to the number of connections that would be needed to 61 maintain a full mesh direct peering between the routers, makes this 62 approach impractical. 64 In order to alleviate the "full mesh" problem, this paper proposes to 65 use IDRP/BGP Route Servers which would relay external routes with all 66 of their attributes between client routers. The clients would maintain 67 IDRP/BGP sessions only with the assigned route servers (sessions with 68 more than one server would be needed if redundancy is desired). Since 69 all external routes and their attributes are relayed unmodified between 70 the client routers, the client routers would acquire the same routing 71 information as they would via direct peering. We refer to such 72 arrangement as virtual peering. Virtual peering allows client routers 73 independently apply selection criteria to acquired external routes 74 according to their local policies as they would if direct peering were 75 established. 77 The routing approach described in this paper assumes that border 78 routers possess a mechanism to resolve the media access address of the 79 next hop router for any route acquired from a virtual peer. 81 It is fair to note that the approach presented in this paper only 82 reduces the number of routing connection each border router needs to 83 maintain. It does not reduce the volume of routing information that 84 needs to maintained at each border router. 86 Besides addressing the "full mesh" problems, the proposal attempts to 87 achieve the following goals: 89 - to minimize BGP/IDRP changes that need to be implemented in client 90 routers in order to inter-operate with route servers; 92 - to provide for redundancy of distribution of routing information to 93 route server clients; 95 - to minimize the amount of routing updates that have to be sent to 96 route server clients; 98 - to provide load distribution between route servers; 100 - to avoid an excessive complexity of the interactions between Route 101 Servers themselves. 103 2. Terms And Acronyms 105 The following terms and acronyms are used in this paper: 107 Routing Domain - a collection of routers with the same set of 108 routing policies. For IPv4 it can be identified 109 with an Autonomous System Number, for IPv6 110 it can be identified with a Routing Domain 111 Identifier. 113 Border Router (BR) - a router that acquires external routes, i.e. 114 routes to internet points outside its routing 115 domain. 117 Route Server (RS) - a process that collects routing information 118 from border routers and distributes this 119 information to 'client routers'. 121 RS Client (RC) - a router than peers with an RS in order to 122 acquire routing information. A server's client 123 can be a router or another route server. 125 RS Cluster (RSC) - two or more of route servers that share the same 126 subset of clients. A RS Cluster provides 127 redundancy of routing information to its 128 clients, i.e. routing information is provided 129 to all RS Cluster clients as long as there is 130 at least one functional route server in the RS 131 Cluster. 133 RCID - Cluster ID 135 3. RS Model 137 In the proposed scheme a Route Server (RS) does not apply any selection 138 criteria to the routes received from border routers for the purpose 139 of distributing these routes to its clients. All routes acquired 140 from border routers or other Route Servers are relayed to the client 141 border routers. 143 There can be two classes of Route Server: Route Server that relays 144 external routes between routers in a single routing domain and Route 145 Server that relays external routes between border routers in different 146 routing domains. The former are Intra-Domain Route Servers and the 147 latter are Inter-Domain Route Servers. 149 In the RS model proposed in this document there is no routing exchange 150 between Intra-Domain Route Servers and Inter-Domain Route Servers. 151 Routes that cross a domain boundary must always pass through a border 152 router of such a domain which may apply administrative filters to such 153 routes. 155 Operations of Intra-Domain Route Servers and Inter-Domain Route Servers 156 are identical. 158 One or more Route Servers form an RS Cluster (RSC). For redundancy's 159 sake two or more RSs can be configured to operate in an RS Cluster. 160 All route servers in an RSC share the same clients, i.e. cluster 161 clients establish connections to all route servers in such an RSC for 162 the purpose of exchanging routing information. Each cluster is assigned 163 an unique RSC Identifier (RCID) represented by a 2-octet unsigned 164 integer. 166 Clusters which provide virtual connectivity between their clients would 167 be normally exchanging routing information among themselves so that all 168 external routes are propagated to all participating clients. 170 Though a Route Server Client (RC) can be associated with multiple RSC, 171 it seems that there is no real advantage of doing so except for a short 172 transition period to provide a graceful re-assignment from one RSC to 173 another or, if for some reason, there are multiple RS groups that don't 174 exchange routing information with each other. 176 The inter-cluster route exchange can be accomplished by forming 177 a full mesh routing adjacency between clusters. In this approach, 178 illustrated in the diagram below, each RS in each RSC would maintain 179 a routing connection with every RS in other RS clusters. Only routes 180 that are acquired from border routers are propagated to RSs in other 181 RS clusters. 183 BR11 BR12 BR1n BR21 BR22 BR2n 184 | | ... | | | ... | 185 ----------------- ------------------ 186 ! RS11 RS12 ! --- ! RS21 RS22 ! 187 ----------------- ------------------ 188 \ / 189 \ / 190 ----------------- 191 ! RS31 RS32 ! 192 ----------------- 193 | | ... | 194 BR31 BR32 BR3n 196 Another way to propagate routing information between clusters would be 197 to form a cluster hierarchy in which an RS in one cluster maintains 198 sessions only with RSs in designated clusters. In this approach an RS 199 must advertise all acquired routes to an RS in another cluster except 200 the routes that are acquired from that cluster. Nevertheless, it 201 allows for minimizing the number of routing sessions which can be 202 highly desirable in some network. It is important for the hierarchical 203 scheme that the inter-cluster route exchange links form a tree, i.e. 204 there is only one route propagation path between any two clusters, 205 otherwise routing loops may result. For detection and pruning of 206 routing loops in a hierarchical cluster topology, it is advisable to 207 include the "RCID Path" attribute (see 4.3.4) in all routing updates 208 sent between route servers. This attribute lists IDs of all clusters 209 in the route propagation path. When a duplicate ID is detected in this 210 attribute an offending route needs to be discarded. 212 The diagram below which illustrates the hierarchical approach is 213 created from the diagram above by removing the route exchange link 214 between clusters 2 and 3. 216 BR11 BR12 BR1n BR21 BR22 BR2n 217 | | ... | | | ... | 218 ----------------- ------------------ 219 ! RS11 RS12 ! --- ! RS21 RS22 ! 220 ----------------- ------------------ 221 \ 222 \ 223 ----------------- 224 ! RS31 RS32 ! 225 ----------------- 226 | | ... | 227 BR31 BR32 BR3n 229 It seems that the only disadvantage of the hierarchical model, is the 230 management headache of avoiding routing loops and redundant information 231 flow by insuring that inter-cluster links always form a tree. But more 232 study is needed to fully evaluate the comparative merits of the 233 full-mesh and hierarchical models. 235 Since RSs in the same cluster maintain routing sessions with the same 236 set of clients, it may seem that there is no need to exchange routing 237 information between RSs in the same cluster. Nevertheless, such a route 238 exchange may help to maintain identical routing databases in the 239 servers during client acquisition periods and when a partial failure 240 may affect some routing sessions. 242 Route servers in the same RS cluster exchange control messages in 243 attempt to subdivide the responsibilities of providing routing 244 information to their clients. In order to simplify the RS design, the 245 RS messaging is implemented on top of exterior protocol which is used 246 by route servers for the routing information exchange. 248 4 Operation 250 4.1 ADVERTISER Path Attribute 251 ------------------------------ 253 Route servers act as concentrators for routes acquired by border 254 routers so that the border routers need to maintain routing 255 connections with only one or two designated route servers. Route 256 Servers distribute routing information that is provided to them by 257 the border routers to all their client. 259 If routing information were relayed to RS clients in UPDATE messages 260 with only those path attribute that are currently defined in the 261 BGP-4/IDRP specification, the RS clients would not be able to associate 262 external routes they receive with the border routers which submitted 263 that routes to route servers. Such an association is necessary for 264 making a correct route selection decision. Therefore, the new path 265 attribute, ADVERTISER, is defined. 267 The ADVERTISER is an optional non-transitive attribute that 268 defines the identifying address of the border router which exported 269 the route to other RS clients. Type Code of the ADVERTISER 270 attribute is 255. This attribute must be included in every UPDATE 271 message that is relayed by route servers and must be recognized by RS 272 clients. 274 4.2 Route Client Operation 275 -------------------------- 277 An RS client establishes an BGP/IDRP connection to every route server 278 in the RS cluster to which the route client is assigned. 280 RS clients must be able to recognize the ADVERTISER path 281 attribute that is included in all UPDATE messages received from route 282 servers. Routes received in UPDATE messages from route servers are 283 processed as if they were received directly from the border routers 284 specified in the ADVERTISER attributes of the respective updates. 286 If an RS client receives a route from a Intra-Domain Route Server, 287 is assumed that the border router identified in the ADVERTISER 288 attribute is located in the receiving client's own routing domain. 290 If an RS client receives a route from a Inter-Domain Route Server, 291 the locality of the border router identified in the ADVERTISER 292 attribute can be determined from the BGP's AS_PATH attribute or 293 IDRP's RD_PATH attribute respectively. 295 If no ADVERTISER attribute was included in an UPDATE message from 296 a route server it is assumed that the route server itself is the 297 exporter of the corresponding route. 299 If the NEXT_HOP path attribute of an UPDATE message lists an address of 300 the receiving router itself, the route that is carried in such an 301 update message must be declared unreachable. 303 In addition, it is highly desirable, albeit not required, to slightly 304 modify the "standard" BGP/IDRP operation when acquiring routes from 305 RSs: 307 when a route is received from an RS and a route with the completely 308 identical attributes has been previously acquired from another RS 309 in the same cluster, the previously acquired route should be 310 replaced with the newly acquired route. Such a route replacement 311 should not trigger any route advertisement action on behalf of the 312 route. 314 RSs are designed to operate in such a way that eliminates the need to 315 keep multiple copies of the same route by RS clients and minimizes the 316 possibility of a route flap when the BGP/IDRP connection to one of the 317 redundant route servers is lost. 319 It is attempted to subdivide the route dissemination load between route 320 servers such that only one RS provides routing updates to a given 321 client. But since, for avoiding an excessive complexity, the 322 reconciliation algorithm does not eliminate completely the possibility 323 of races, it is still possible that a client may receive updates from 324 more than one route server. Therefore, the client's ability to discard 325 duplicate routes may reduce the need for a bigger routing database. 327 4.3 Route Server Operation 328 -------------------------- 330 A Route Server maintains BGP-4/IDRP sessions with its clients according 331 to the respective BGP-4/IDRP specification with exception of protocol 332 modifications outlined in this document. 334 UPDATE messages sent by route servers have the same format and 335 semantics as it respective BGP-4/IDRP counterparts but also carry the 336 ADVERTISER path attribute which specifies the BGP Identifier of the 337 border router that exported the route advertised in the UPDATE message. 338 In addition, if the hierarchical model is deployed to interconnect 339 Route Server clusters, it is advisable to include the "RCID Path" 340 attribute in all routing updates sent between route servers as 341 described in 4.3.4. 343 When route servers exchange OPEN messages they include the Route Server 344 protocol version (current version is 1) as well as Cluster IDs of their 345 respective clusters in an Optional Parameter of the OPEN message. The 346 value of Parameter Type for this parameter is 255. The length of the 347 parameter data is 3 octets. The format of parameter data is shown 348 below: 350 +-----------------------+------------------------------------+ 351 | Version = 1 (1 octet) | Cluster ID (2 octets) | 352 +-----------------------+------------------------------------+ 354 Also, route servers that belong to the same cluster send to each other 355 LIST messages with lists of clients to which they're providing routing 356 information. In the LIST message an RS specifies the Router Identifier 357 of each client to which that RS is providing routing updates. Since 358 LIST messages are relatively small there is no need to add a processing 359 complexity of generating incremental updates when a list changes; 360 instead the complete list is sent when RSs need to be informed of the 361 changes. The format of the LIST message is presented in 4.3.1. 363 4.3.1 LIST Message Format 364 ------------------------- 366 The LIST message contains the fixed BGP/IDRP header that is followed 367 with the fields shown below. The type code in the fixed header of 368 the LIST message is 255. 370 +----------+----------+----------+----------+ 371 | Client Identifying Address | Repeated for each 372 +-------------------------------------------+ informed client 374 The number of Client Identifying Address" fields is not encoded 375 explicitly, but can be calculated as: 377 ( -
) /
, 379 where is the value encoded in the fixed 380 BGP/IDRP header,
is the length of that header, and 381
is 4 for IPv4 and 16 for IPv6. 383 4.3.2 External Route Acquisition And Advertisement 384 ---------------------------------------------------- 386 A route server acquires external routes from RS clients that are also 387 border routers. A RS also may acquire external routes from other RSs. 388 Route servers relay all acquired routes unaltered to their clients. 389 No route selection is performed for purpose of route re-advertisement 390 to RS clients. 392 While route servers receive and store routing data from all their 393 client, Routing Servers in the same cluster coordinate their route 394 advertisement in the attempt to ensure that only one RS provides 395 routing updates to a given client. If an RS fails, other Route 396 Servers in the cluster take over the responsibility of providing 397 routing updates to the clients that were previously served by the 398 failed RS. A route flap that can result from such switch-over can be 399 eliminated by the configuring client's "Hold Time" of their BGP-4/IDRP 400 sessions with the route servers to be larger than the switch-over 401 time. The switch-over time is determined by the Hold Time of 402 BGP-4/IDRP sessions between the route servers in the cluster and the 403 period that is needed for that route servers to reconcile their route 404 advertisement responsibilities. The reconciliation protocol is 405 described in 4.3.3. 407 The BGP-4/IDRP operations of route servers differs from the "standard" 408 operation in the following ways: 410 - when receiving routes from another RS, the RS Client mode of 411 operation is assumed, i.e., when a route with completely 412 identical attributes has been previously acquired from an RS 413 belonging to the same cluster as the RS that advertises the new 414 route, the previously acquired route should be discarded and 415 the newly acquired route should be accepted. Such a route 416 replacement should not trigger any route advertisement action 417 on behalf of the route. 419 - all acquired routes are advertised to a client router except 420 routes which were acquired from that client (no route echoing); 422 - if the hierarchical model of inter-cluster route exchange is 423 used, all acquired routes are advertised to an RS in another 424 RSC except routes that are acquired from that RSC. In the 425 full-mesh model, only routes which are acquired from border 426 routers are advertised to route servers in other clusters; 428 - if route servers in the same RS cluster are configured to 429 exchange routing information, only external routes that are 430 acquired from border routers are advertised to route servers in 431 the local cluster; 433 - the ADVERTISER path attribute is included in every UPDATE 434 messages that is generated by RS. This attribute must 435 specify the identifying address of the border router from which 436 information provided in UPDATE has been acquired. All other 437 routing attributes should be relayed to RS's peers unaltered. 439 - when a route advertised by to an RS by a client becomes 440 unreachable such a route needs to be declared unreachable to 441 all other clients. In order to withdraw a route, the route 442 server sends an UPDATE for that route to each client (except 443 the client that this route was originally acquired) with the 444 NEXT_HOP path attribute set to the address of the client to 445 which this UPDATE is sent to. The the ADVERTISER path attribute 446 with the identifying address of the border router that 447 originally advertised the withdrawn route must be also included 448 in such an update message. 450 - if the hierarchical model is deployed to interconnect Route 451 Server clusters, it is advisable to include the RCID_PATH 452 attribute in all routing updates sent between route servers as 453 described in 4.3.4. The RCID_PATH attribute is never included 454 in UPDATE messages sent to border routers. 456 4.3.3 Intra-Cluster Coordination 457 -------------------------------- 459 In order to coordinate route advertisement activities, route servers 460 which are members of the same RS cluster establish and maintain 461 BGP/IDRP connections between themselves forming a full-mesh 462 connectivity. Normally, there is no need for more than two-three 463 route servers in one cluster. 465 Route servers belonging to the same cluster send to each other LIST 466 messages with lists of clients to which they're providing routing 467 information; let's call such clients "informed clients". 469 Each RS maintains a separate "informed client" list for each RS in 470 the local cluster including itself. All such lists are linked in 471 an ascending order that is determined by the number of clients in 472 each list; the order among the lists with the same number of clients 473 is determined by comparing the identifying addresses of the 474 corresponding RSs -- an RS in such a "same number of clients" subset 475 is positioned after all RSs with the lower address. 477 An RS can be in one of two RS coordination states: 'Initiation' and 478 'Active'. 480 4.3.3.1 Initiation State 481 ------------------------ 483 This is the initial state of route server that is entered upon RS 484 startup. When the Initiation state is entered the 'InitiationTimer' 485 is started. The Initiation state transits to the Active state upon 486 expiration of the 'InitiationTimer' or as soon as all configured 487 BGP/IDRP connections to other route servers in the local RS Cluster 488 are established and LIST messages from that route servers are 489 received. 491 In the Initiation state an RS: 493 o tries to establish connections with other RSs in the local and 494 remote clusters. 496 o accepts BGP/IDRP connections from client routers. 498 o receives and process BGP/IDRP updates but doesn't send any 499 routing updates. 501 o stores "informed client" lists received from other RSs in the 502 local cluster - a newly received list replaces the existing list 503 for the same RS. If a LIST message is received from the route 504 server in another RS cluster, it should be silently ignored. 506 o initializes an empty "informed client" list for its own clients. 507 o as soon as a BGP/IDRP connection to an RS in the same RS Cluster 508 is established, transmits an empty LIST message to such an RS. 510 4.3.3.2 Active State 511 -------------------- 513 This state is entered upon expiration of the 'InitiationTimer' or as 514 soon as all configured BGP/IDRP connections to other route servers in 515 the local RS Cluster are established and LIST messages from that 516 route servers are received. 518 In the Active state an RS: 520 o continues attempts to establish connections with other route 521 servers in the local and remote clusters; 523 o accepts new BGP/IDRP connections; 525 o transmits a LIST message to an RS in the local cluster as soon 526 as an BGP/IDRP session with the RS is established and then 527 whenever the local "informed client" list changes; 529 o receives and process BGP/IDRP updates; 531 o receives and processes "informed client" lists as described 532 below: 534 a) If a LIST message is received from the route server in 535 another RS cluster, it should be silently ignored. 537 b) If a LIST message is received from a route server that 538 belongs to the same RS Cluster, the differences between 539 the old and the new list are determined and the old "informed 540 client" list for that RS is replaced by the list from the new 541 message. For each client that was in the old list but not in 542 the new list it is checked whether the server has 543 an established BGP/IDRP connection to that client and 544 the client is not in any of the other "informed client" 545 lists. If both conditions are met, the processing described 546 for a new client takes place (see 4.3.3.3). 548 o for each new BGP/IDRP client (including connections established 549 in Initiation state), decides if that client should become an 550 "informed client", i.e. whether routing updates are to be sent 551 to the client or that client has been already taken care by 552 another RS in the local cluster. The decision process is 553 described in the next section. 555 4.3.3.3 New Client Processing 556 ----------------------------- 558 Whenever an RS acquires a new BGP/IDRP peer it scans through all 559 "informed client" lists in order to determine if this peer has 560 already been receiving routing updates from another RS in the local 561 RS cluster. If the identifying address of the peer is found in one 562 of the list, no routing updates are sent to that peer. 564 If the peer's Router Id is not found, the route server initiates 565 a 'DelayTimer' timer for that peer and the decision is postponed 566 until that timer expires. The delay value is calculated as followed: 568 the RS determines the relative position of its own "informed 569 client" list in the linked list of all "informed client" lists. 570 If such position is expressed with a number, say N, in the 1 to 571 "maximum number of lists" range, then the delay value is set to 572 (N-1)*. 574 Upon expiration of the DelayTimer, the "informed client" lists are 575 scanned once again to see if the corresponding peer has already been 576 receiving routing updates from another RS in the local RS cluster. 577 If the Router Id of the peer is found in one of the lists as 578 a result of receiving a new LIST message, no routing updates are 579 sent to that peer. Otherwise, the peer's Router ID is entered in 580 the "informed client" list that belongs to the RS, the transmission 581 of the updated LIST message is immediately scheduled, and routing 582 updates are sent to the client. 584 The rational for the delay is to minimize races in the decision as 585 which RS among route servers in the same RSC is going to provide 586 routing information to a given client. The RS with least number of 587 "informed clients" would have a shortest delay and is the most 588 probable to win the race. This helps to equalize the number of 589 "informed clients" between RSc in a cluster. 591 After an BGP/IDRP peer is placed in the "informed client" list, it is 592 only removed from the list when the BGP/IDRP connection to this peer 593 is lost. While an RS client is in the list it is accurately updated 594 with all routing changes. 596 4.3.3.5 Inter-RS Connection Failure 597 ----------------------------------- 599 If a route server loses a routing session with a route server in 600 the same cluster, it must consider taking the responsibilities 601 of route advertisement to the clients that are in the "informed 602 client" list of the remote route server of the failed session. 604 For each such client it is checked whether the server has an 605 established BGP/IDRP connection to that client and the client is not 606 in any of the "informed client" lists of active RS. If both 607 conditions are true, the processing described for a new client takes 608 place (see 4.3.3.3). 610 After advertisement responsibilities are reconciled the "informed 611 client" list associated with the failed session should be discarded. 613 4.3.4 RCID_PATH Attribute 614 ------------------------- 616 The RCID_PATH is an optional non-transitive attribute that is composed 617 of a sequence of RS Cluster Identifiers (RCID) that identifies the 618 RS Cluster through which routing information carried in the UPDATE 619 message has passed. Type Code of the RCID_PATH attribute is 254. 620 The attribute value field contains one or more RS Cluster Identifiers, 621 each encoded as a 2-octets long field. 623 When a route server propagates a route which has been learned from 624 another Route Server's UPDATE message, the following is performed 625 with respect to the the RCID_PATH attribute: 627 - if the destination of the route is a border route, the RCID_PATH 628 Attribute is excluded from the UPDATE message sent to that 629 border router. 631 - if the destination of the route is another route server that is 632 located in the advertising server's own RS cluster, the 633 RCID_PATH attribute is sent unmodified. 635 - if the destination of the route is a route server in a different 636 RS cluster, the advertising route server shall verify that the 637 RCID of the destination speaker's cluster is not present in 638 the RCID_PATH attribute associated with route. If it does, 639 the route shall not be advertised and an event indicating 640 that a route loop was detected should be logged, otherwise 641 the advertising router shall prepend its own RCID to the RCID 642 sequence in the RCID_PATH attribute (put it in the leftmost 643 position). 645 When a route server propagates a route which has been learned from 646 a border router to another route server then: 648 - if the destination of the route is a route server that is 649 located in the advertising router's own RS cluster, an empty 650 RCID_PATH attribute shall be included in the UPDATE message 651 (an empty RCID_PATH attribute is one whose length field contains 652 the value zero). 654 - if the destination of the route is a route server in a different 655 RS cluster, the advertising route server shall include its own 656 RCID in the RCID_PATH attribute. In this case, the RCID of 657 advertising route server will be the only entry in the AS_PATH 658 attribute. 660 4.3.5 NOTIFICATION Error Codes 661 ------------------------------ 663 In addition to the error codes defined in the BGP-4/IDRP specification, 664 the following error can be indicated in a NOTIFICATION message that is 665 sent by a route server: 667 255 LIST Message Error 669 The following error subcodes can be associated with the LIST Message 670 Error: 672 1 - Bad Address. This subcode indicates that a Client Identifying 673 Address in the received LIST message does not represent 674 a valid network layer address of a router interface. 676 The following additional UPDATE error subcodes are also defined: 678 255 - Invalid ADVERTISER Attribute. This subcode indicates that 679 a value of the ADVERTISER Attribute does not represent 680 a valid network layer address of a router interface. 682 4.3.7 Timers 683 ------------ 685 The InitiationTimer value of 5 minutes is suggested. 687 In order to avoid route flaps during an RS switch-over, a value of 688 DelayGranularity should be such so the maximum possible value of 689 the DelayTimer (see 4.3.3.3) combined with the Hold Time of inter-RS 690 connections would be shorter than two-third of the smallest Hold Time 691 interval of all BGP/IDRP connections between the route servers and 692 their clients (including RSs in other clusters). So in a cluster 693 with three RSs and the respective Hold Times of 30 and 90 seconds 694 the DelayGranularity of 15 seconds would be a recommended value. 696 For the same reason it is recommended that the Hold Time of BGP/IDRP 697 connections between route servers in the same cluster is set to 698 one-third of the smallest Hold Time of all BGP/IDRP connections 699 between the route servers and their clients (including RSs in other 700 clusters). So, if the smallest Hold Time of BGP/IDRP sessions with 701 clients is 90 seconds, the recommended value of the Hold Time of 702 BGP/IDRP connections between route servers in that cluster would be 703 30 seconds. 705 5. Route Server Discovery 707 This document does not propose any mechanism for the dynamic RS 708 discovery by RS clients or/and by other route servers. It is 709 assumed that at minimum a manual configuration will be provided in 710 participating routers to achieve the needed connectivity. 712 7. Security Considerations 714 Security issues are not discussed in this document. 716 8. Acknowledgment 718 Some design concepts presented in this paper benefited from 719 discussions with Tony Li (cisco Systems). 721 Author likes to thank John Krawczyk (Bay Networks) and for his 722 review and valuable comments. 724 Also, author would like to thank Yakov Rekhter (IBM) for the review 725 of the earlier version of this document and constructive comments. 727 Special thanks to Ray Chang (Bay Networks) whose experience in 728 implementing the concepts presented in this document helped to 729 refine the route server design. 731 9. References 733 [BGP4] Y. Rekhter, T.Li "A Border Gateway Protocol 4 734 (BGP-4)", Internet Draft 736 [IDRP] Y. Rekhter, P. Traina "IDRP for IPv6", Internet Draft 738 9. Author's Addresses 740 Dimitry Haskin 741 Bay Networks, Inc. 742 2 Federal Street 743 Billerica, MA 01821 744 email: dhaskin@baynetworks.com