idnits 2.17.1 draft-ietf-idr-bgp4-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-23) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 58 longer pages, the longest (page 2) being 61 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 5 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** There is 1 instance of lines with control characters in the document. == There is 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 370: '...sage, a BGP speaker MUST calculate the...' RFC 2119 keyword, line 373: '... Hold Time MUST be either zero o...' RFC 2119 keyword, line 754: '... MUST NOT be sent more frequently th...' RFC 2119 keyword, line 755: '... implementation MAY adjust the rate a...' RFC 2119 keyword, line 759: '... messages MUST NOT be sent....' (9 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 694 has weird spacing: '... Length are t...' == Line 1088 has weird spacing: '...is less than...' == Line 1556 has weird spacing: '...s whose desti...' == Line 1916 has weird spacing: '...nations of th...' == Line 2074 has weird spacing: '...ecision proce...' == (2 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 1996) is 10113 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Historic RFC: RFC 904 (ref. '1') ** Downref: Normative reference to an Unknown state RFC: RFC 1092 (ref. '2') ** Downref: Normative reference to an Unknown state RFC: RFC 1093 (ref. '3') ** Obsolete normative reference: RFC 793 (ref. '4') (Obsoleted by RFC 9293) -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '7' ** Obsolete normative reference: RFC 1519 (ref. '8') (Obsoleted by RFC 4632) ** Downref: Normative reference to an Historic RFC: RFC 1518 (ref. '9') Summary: 20 errors (**), 0 flaws (~~), 9 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Rekhter 3 INTERNET DRAFT cisco Systems 4 T.Li 5 Juniper 6 Editors 7 August 1996 9 A Border Gateway Protocol 4 (BGP-4) 11 Status of this Memo 13 This document, together with its companion document, "Application of 14 the Border Gateway Protocol in the Internet", define an inter- 15 autonomous system routing protocol for the Internet. This document 16 specifies an IAB standards track protocol for the Internet community, 17 and requests discussion and suggestions for improvements. Please 18 refer to the current edition of the "IAB Official Protocol Standards" 19 for the standardization state and status of this protocol. 20 Distribution of this document is unlimited. 22 This document is an Internet Draft. Internet Drafts are working 23 documents of the Internet Engineering Task Force (IETF), its Areas, 24 and its Working Groups. Note that other groups may also distribute 25 working documents as Internet Drafts. 27 Internet Drafts are draft documents valid for a maximum of six 28 months. Internet Drafts may be updated, replaced, or obsoleted by 29 other documents at any time. It is not appropriate to use Internet 30 Drafts as reference material or to cite them other than as a "working 31 draft" or "work in progress". 33 1. Acknowledgements 35 This document was originally published as RFC 1267 in October 1991, 36 jointly authored by Kirk Lougheed and Yakov Rekhter. 38 We would like to express our thanks to Guy Almes, Len Bosack, and 39 Jeffrey C. Honig for their contributions to the earlier version of 40 this document. 42 We like to explicitly thank Bob Braden for the review of the earlier 43 version of this document as well as his constructive and valuable 44 comments. 46 We would also like to thank Bob Hinden, Director for Routing of the 47 Internet Engineering Steering Group, and the team of reviewers he 48 assembled to review the previous version (BGP-2) of this document. 49 This team, consisting of Deborah Estrin, Milo Medin, John Moy, Radia 50 Perlman, Martha Steenstrup, Mike St. Johns, and Paul Tsuchiya, acted 51 with a strong combination of toughness, professionalism, and 52 courtesy. 54 This updated version of the document is the product of the IETF IDR 55 Working Group with Yakov Rekhter and Tony Li as editors. Certain 56 sections of the document borrowed heavily from IDRP [7], which is the 57 OSI counterpart of BGP. For this credit should be given to the ANSI 58 X3S3.3 group chaired by Lyman Chapin and to Charles Kunzinger who was 59 the IDRP editor within that group. We would also like to thank Mike 60 Craren, Dimitry Haskin, John Krawczyk, and Paul Traina for their 61 insightful comments. 63 We would like to specially acknowledge numerous contributions by 64 Dennis Ferguson. 66 2. Introduction 68 The Border Gateway Protocol (BGP) is an inter-Autonomous System 69 routing protocol. It is built on experience gained with EGP as 70 defined in RFC 904 [1] and EGP usage in the NSFNET Backbone as 71 described in RFC 1092 [2] and RFC 1093 [3]. 73 The primary function of a BGP speaking system is to exchange network 74 reachability information with other BGP systems. This network 75 reachability information includes information on the list of 76 Autonomous Systems (ASs) that reachability information traverses. 77 This information is sufficient to construct a graph of AS 78 connectivity from which routing loops may be pruned and some policy 79 decisions at the AS level may be enforced. 81 BGP-4 provides a new set of mechanisms for supporting classless 82 interdomain routing. These mechanisms include support for 83 advertising an IP prefix and eliminates the concept of network 84 "class" within BGP. BGP-4 also introduces mechanisms which allow 85 aggregation of routes, including aggregation of AS paths. These 86 changes provide support for the proposed supernetting scheme [8, 9]. 88 To characterize the set of policy decisions that can be enforced 89 using BGP, one must focus on the rule that a BGP speaker advertise to 90 its peers (other BGP speakers which it communicates with) in 91 neighboring ASs only those routes that it itself uses. This rule 92 reflects the "hop-by-hop" routing paradigm generally used throughout 93 the current Internet. Note that some policies cannot be supported by 94 the "hop-by-hop" routing paradigm and thus require techniques such as 95 source routing to enforce. For example, BGP does not enable one AS 96 to send traffic to a neighboring AS intending that the traffic take a 97 different route from that taken by traffic originating in the 98 neighboring AS. On the other hand, BGP can support any policy 99 conforming to the "hop-by-hop" routing paradigm. Since the current 100 Internet uses only the "hop-by-hop" routing paradigm and since BGP 101 can support any policy that conforms to that paradigm, BGP is highly 102 applicable as an inter-AS routing protocol for the current Internet. 104 A more complete discussion of what policies can and cannot be 105 enforced with BGP is outside the scope of this document (but refer to 106 the companion document discussing BGP usage [5]). 108 BGP runs over a reliable transport protocol. This eliminates the 109 need to implement explicit update fragmentation, retransmission, 110 acknowledgement, and sequencing. Any authentication scheme used by 111 the transport protocol may be used in addition to BGP's own 112 authentication mechanisms. The error notification mechanism used in 113 BGP assumes that the transport protocol supports a "graceful" close, 114 i.e., that all outstanding data will be delivered before the 115 connection is closed. 117 BGP uses TCP [4] as its transport protocol. TCP meets BGP's 118 transport requirements and is present in virtually all commercial 119 routers and hosts. In the following descriptions the phrase 120 "transport protocol connection" can be understood to refer to a TCP 121 connection. BGP uses TCP port 179 for establishing its connections. 123 This document uses the term `Autonomous System' (AS) throughout. The 124 classic definition of an Autonomous System is a set of routers under 125 a single technical administration, using an interior gateway protocol 126 and common metrics to route packets within the AS, and using an 127 exterior gateway protocol to route packets to other ASs. Since this 128 classic definition was developed, it has become common for a single 129 AS to use several interior gateway protocols and sometimes several 130 sets of metrics within an AS. The use of the term Autonomous System 131 here stresses the fact that, even when multiple IGPs and metrics are 132 used, the administration of an AS appears to other ASs to have a 133 single coherent interior routing plan and presents a consistent 134 picture of what destinations are reachable through it. 136 The planned use of BGP in the Internet environment, including such 137 issues as topology, the interaction between BGP and IGPs, and the 138 enforcement of routing policy rules is presented in a companion 139 document [5]. This document is the first of a series of documents 140 planned to explore various aspects of BGP application. Please send 141 comments to the BGP mailing list (bgp@ans.net). 143 3. Summary of Operation 145 Two systems form a transport protocol connection between one another. 146 They exchange messages to open and confirm the connection parameters. 147 The initial data flow is the entire BGP routing table. Incremental 148 updates are sent as the routing tables change. BGP does not require 149 periodic refresh of the entire BGP routing table. Therefore, a BGP 150 speaker must retain the current version of the entire BGP routing 151 tables of all of its peers for the duration of the connection. 152 KeepAlive messages are sent periodically to ensure the liveness of 153 the connection. Notification messages are sent in response to errors 154 or special conditions. If a connection encounters an error 155 condition, a notification message is sent and the connection is 156 closed. 158 The hosts executing the Border Gateway Protocol need not be routers. 159 A non-routing host could exchange routing information with routers 160 via EGP or even an interior routing protocol. That non-routing host 161 could then use BGP to exchange routing information with a border 162 router in another Autonomous System. The implications and 163 applications of this architecture are for further study. 165 If a particular AS has multiple BGP speakers and is providing transit 166 service for other ASs, then care must be taken to ensure a consistent 167 view of routing within the AS. A consistent view of the interior 168 routes of the AS is provided by the interior routing protocol. A 169 consistent view of the routes exterior to the AS can be provided by 170 having all BGP speakers within the AS maintain direct BGP connections 171 with each other. Using a common set of policies, the BGP speakers 172 arrive at an agreement as to which border routers will serve as 173 exit/entry points for particular destinations outside the AS. This 174 information is communicated to the AS's internal routers, possibly 175 via the interior routing protocol. Care must be taken to ensure that 176 the interior routers have all been updated with transit information 177 before the BGP speakers announce to other ASs that transit service is 178 being provided. 180 Connections between BGP speakers of different ASs are referred to as 181 "external" links. BGP connections between BGP speakers within the 182 same AS are referred to as "internal" links. Similarly, a peer in a 183 different AS is referred to as an external peer, while a peer in the 184 same AS may be described as an internal peer. 186 3.1 Routes: Advertisement and Storage 188 For purposes of this protocol a route is defined as a unit of 189 information that pairs a destination with the attributes of a path to 190 that destination: 192 - Routes are advertised between a pair of BGP speakers in UPDATE 193 messages: the destination is the systems whose IP addresses are 194 reported in the Network Layer Reachability Information (NLRI) 195 field, and the the path is the information reported in the path 196 attributes fields of the same UPDATE message. 198 - Routes are stored in the Routing Information Bases (RIBs): 199 namely, the Adj-RIBs-In, the Loc-RIB, and the Adj-RIBs-Out. Routes 200 that will be advertised to other BGP speakers must be present in 201 the Adj-RIB-Out; routes that will be used by the local BGP speaker 202 must be present in the Loc-RIB, and the next hop for each of these 203 routes must be present in the local BGP speaker's forwarding 204 information base; and routes that are received from other BGP 205 speakers are present in the Adj-RIBs-In. 207 If a BGP speaker chooses to advertise the route, it may add to or 208 modify the path attributes of the route before advertising it to a 209 peer. 211 BGP provides mechanisms by which a BGP speaker can inform its peer 212 that a previously advertised route is no longer available for use. 213 There are three methods by which a given BGP speaker can indicate 214 that a route has been withdrawn from service: 216 a) the IP prefix that expresses destinations for a previously 217 advertised route can be advertised in the WITHDRAWN ROUTES field 218 in the UPDATE message, thus marking the associated route as being 219 no longer available for use 221 b) a replacement route with the same Network Layer Reachability 222 Information can be advertised, or 224 c) the BGP speaker - BGP speaker connection can be closed, which 225 implicitly removes from service all routes which the pair of 226 speakers had advertised to each other. 228 3.2 Routing Information Bases 230 The Routing Information Base (RIB) within a BGP speaker consists of 231 three distinct parts: 233 a) Adj-RIBs-In: The Adj-RIBs-In store routing information that has 234 been learned from inbound UPDATE messages. Their contents 235 represent routes that are available as an input to the Decision 236 Process. 238 b) Loc-RIB: The Loc-RIB contains the local routing information 239 that the BGP speaker has selected by applying its local policies 240 to the routing information contained in its Adj-RIBs-In. 242 c) Adj-RIBs-Out: The Adj-RIBs-Out store the information that the 243 local BGP speaker has selected for advertisement to its peers. The 244 routing information stored in the Adj-RIBs-Out will be carried in 245 the local BGP speaker's UPDATE messages and advertised to its 246 peers. 248 In summary, the Adj-RIBs-In contain unprocessed routing information 249 that has been advertised to the local BGP speaker by its peers; the 250 Loc-RIB contains the routes that have been selected by the local BGP 251 speaker's Decision Process; and the Adj-RIBs-Out organize the routes 252 for advertisement to specific peers by means of the local speaker's 253 UPDATE messages. 255 Although the conceptual model distinguishes between Adj-RIBs-In, 256 Loc-RIB, and Adj-RIBs-Out, this neither implies nor requires that an 257 implementation must maintain three separate copies of the routing 258 information. The choice of implementation (for example, 3 copies of 259 the information vs 1 copy with pointers) is not constrained by the 260 protocol. 262 4. Message Formats 264 This section describes message formats used by BGP. 266 Messages are sent over a reliable transport protocol connection. A 267 message is processed only after it is entirely received. The maximum 268 message size is 4096 octets. All implementations are required to 269 support this maximum message size. The smallest message that may be 270 sent consists of a BGP header without a data portion, or 19 octets. 272 4.1 Message Header Format 274 Each message has a fixed-size header. There may or may not be a data 275 portion following the header, depending on the message type. The 276 layout of these fields is shown below: 278 0 1 2 3 279 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 280 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 281 | | 282 + + 283 | | 284 + + 285 | Marker | 286 + + 287 | | 288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 289 | Length | Type | 290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 292 Marker: 294 This 16-octet field contains a value that the receiver of the 295 message can predict. If the Type of the message is OPEN, or if 296 the OPEN message carries no Authentication Information (as an 297 Optional Parameter), then the Marker must be all ones. 298 Otherwise, the value of the marker can be predicted by some a 299 computation specified as part of the authentication mechanism 300 (which is specified as part of the Authentication Information) 301 used. The Marker can be used to detect loss of synchronization 302 between a pair of BGP peers, and to authenticate incoming BGP 303 messages. 305 Length: 307 This 2-octet unsigned integer indicates the total length of the 308 message, including the header, in octets. Thus, e.g., it 309 allows one to locate in the transport-level stream the (Marker 310 field of the) next message. The value of the Length field must 311 always be at least 19 and no greater than 4096, and may be 312 further constrained, depending on the message type. No 313 "padding" of extra data after the message is allowed, so the 314 Length field must have the smallest value required given the 315 rest of the message. 317 Type: 319 This 1-octet unsigned integer indicates the type code of the 320 message. The following type codes are defined: 322 1 - OPEN 323 2 - UPDATE 324 3 - NOTIFICATION 325 4 - KEEPALIVE 327 4.2 OPEN Message Format 329 After a transport protocol connection is established, the first 330 message sent by each side is an OPEN message. If the OPEN message is 331 acceptable, a KEEPALIVE message confirming the OPEN is sent back. 332 Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION 333 messages may be exchanged. 335 In addition to the fixed-size BGP header, the OPEN message contains 336 the following fields: 338 0 1 2 3 339 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 340 +-+-+-+-+-+-+-+-+ 341 | Version | 342 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 343 | My Autonomous System | 344 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 345 | Hold Time | 346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 347 | BGP Identifier | 348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 349 | Opt Parm Len | 350 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 351 | | 352 | Optional Parameters | 353 | | 354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 356 Version: 358 This 1-octet unsigned integer indicates the protocol version 359 number of the message. The current BGP version number is 4. 361 My Autonomous System: 363 This 2-octet unsigned integer indicates the Autonomous System 364 number of the sender. 366 Hold Time: 368 This 2-octet unsigned integer indicates the number of seconds 369 that the sender proposes for the value of the Hold Timer. Upon 370 receipt of an OPEN message, a BGP speaker MUST calculate the 371 value of the Hold Timer by using the smaller of its configured 372 Hold Time and the Hold Time received in the OPEN message. The 373 Hold Time MUST be either zero or at least three seconds. An 374 implementation may reject connections on the basis of the Hold 375 Time. The calculated value indicates the maximum number of 376 seconds that may elapse between the receipt of successive 377 KEEPALIVE, and/or UPDATE messages by the sender. 379 BGP Identifier: 380 This 4-octet unsigned integer indicates the BGP Identifier of 381 the sender. A given BGP speaker sets the value of its BGP 382 Identifier to an IP address assigned to that BGP speaker. The 383 value of the BGP Identifier is determined on startup and is the 384 same for every local interface and every BGP peer. 386 Optional Parameters Length: 388 This 1-octet unsigned integer indicates the total length of the 389 Optional Parameters field in octets. If the value of this field 390 is zero, no Optional Parameters are present. 392 Optional Parameters: 394 This field may contain a list of optional parameters, where 395 each parameter is encoded as a triplet. 398 0 1 399 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 401 | Parm. Type | Parm. Length | Parameter Value (variable) 402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 404 Parameter Type is a one octet field that unambiguously 405 identifies individual parameters. Parameter Length is a one 406 octet field that contains the length of the Parameter Value 407 field in octets. Parameter Value is a variable length field 408 that is interpreted according to the value of the Parameter 409 Type field. 411 This document defines the following Optional Parameters: 413 a) Authentication Information (Parameter Type 1): 415 This optional parameter may be used to authenticate a BGP 416 peer. The Parameter Value field contains a 1-octet 417 Authentication Code followed by a variable length 418 Authentication Data. 420 0 1 2 3 4 5 6 7 8 421 +-+-+-+-+-+-+-+-+ 422 | Auth. Code | 423 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 424 | | 425 | Authentication Data | 426 | | 427 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 429 Authentication Code: 431 This 1-octet unsigned integer indicates the 432 authentication mechanism being used. Whenever an 433 authentication mechanism is specified for use within 434 BGP, three things must be included in the 435 specification: 436 - the value of the Authentication Code which indicates 437 use of the mechanism, 438 - the form and meaning of the Authentication Data, and 439 - the algorithm for computing values of Marker fields. 441 Note that a separate authentication mechanism may be 442 used in establishing the transport level connection. 444 Authentication Data: 446 The form and meaning of this field is a variable- 447 length field depend on the Authentication Code. 449 The minimum length of the OPEN message is 29 octets (including 450 message header). 452 4.3 UPDATE Message Format 454 UPDATE messages are used to transfer routing information between BGP 455 peers. The information in the UPDATE packet can be used to construct 456 a graph describing the relationships of the various Autonomous 457 Systems. By applying rules to be discussed, routing information 458 loops and some other anomalies may be detected and removed from 459 inter-AS routing. 461 An UPDATE message is used to advertise a single feasible route to a 462 peer, or to withdraw multiple unfeasible routes from service (see 463 3.1). An UPDATE message may simultaneously advertise a feasible route 464 and withdraw multiple unfeasible routes from service. The UPDATE 465 message always includes the fixed-size BGP header, and can optionally 466 include the other fields as shown below: 468 +-----------------------------------------------------+ 469 | Unfeasible Routes Length (2 octets) | 470 +-----------------------------------------------------+ 471 | Withdrawn Routes (variable) | 472 +-----------------------------------------------------+ 473 | Total Path Attribute Length (2 octets) | 474 +-----------------------------------------------------+ 475 | Path Attributes (variable) | 476 +-----------------------------------------------------+ 477 | Network Layer Reachability Information (variable) | 478 +-----------------------------------------------------+ 480 Unfeasible Routes Length: 482 This 2-octets unsigned integer indicates the total length of 483 the Withdrawn Routes field in octets. Its value must allow the 484 length of the Network Layer Reachability Information field to 485 be determined as specified below. 487 A value of 0 indicates that no routes are being withdrawn from 488 service, and that the WITHDRAWN ROUTES field is not present in 489 this UPDATE message. 491 Withdrawn Routes: 493 This is a variable length field that contains a list of IP 494 address prefixes for the routes that are being withdrawn from 495 service. Each IP address prefix is encoded as a 2-tuple of the 496 form , whose fields are described below: 498 +---------------------------+ 499 | Length (1 octet) | 500 +---------------------------+ 501 | Prefix (variable) | 502 +---------------------------+ 504 The use and the meaning of these fields are as follows: 506 a) Length: 508 The Length field indicates the length in bits of the IP 509 address prefix. A length of zero indicates a prefix that 510 matches all IP addresses (with prefix, itself, of zero 511 octets). 513 b) Prefix: 515 The Prefix field contains IP address prefixes followed by 516 enough trailing bits to make the end of the field fall on an 517 octet boundary. Note that the value of trailing bits is 518 irrelevant. 520 Total Path Attribute Length: 522 This 2-octet unsigned integer indicates the total length of the 523 Path Attributes field in octets. Its value must allow the 524 length of the Network Layer Reachability field to be determined 525 as specified below. 527 A value of 0 indicates that no Network Layer Reachability 528 Information field is present in this UPDATE message. 530 Path Attributes: 532 A variable length sequence of path attributes is present in 533 every UPDATE. Each path attribute is a triple of variable length. 536 Attribute Type is a two-octet field that consists of the 537 Attribute Flags octet followed by the Attribute Type Code 538 octet. 540 0 1 541 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 543 | Attr. Flags |Attr. Type Code| 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 546 The high-order bit (bit 0) of the Attribute Flags octet is the 547 Optional bit. It defines whether the attribute is optional (if 548 set to 1) or well-known (if set to 0). 550 The second high-order bit (bit 1) of the Attribute Flags octet 551 is the Transitive bit. It defines whether an optional 552 attribute is transitive (if set to 1) or non-transitive (if set 553 to 0). For well-known attributes, the Transitive bit must be 554 set to 1. (See Section 5 for a discussion of transitive 555 attributes.) 557 The third high-order bit (bit 2) of the Attribute Flags octet 558 is the Partial bit. It defines whether the information 559 contained in the optional transitive attribute is partial (if 560 set to 1) or complete (if set to 0). For well-known attributes 561 and for optional non-transitive attributes the Partial bit must 562 be set to 0. 564 The fourth high-order bit (bit 3) of the Attribute Flags octet 565 is the Extended Length bit. It defines whether the Attribute 566 Length is one octet (if set to 0) or two octets (if set to 1). 567 Extended Length may be used only if the length of the attribute 568 value is greater than 255 octets. 570 The lower-order four bits of the Attribute Flags octet are . 571 unused. They must be zero (and must be ignored when received). 573 The Attribute Type Code octet contains the Attribute Type Code. 575 Currently defined Attribute Type Codes are discussed in Section 576 5. 578 If the Extended Length bit of the Attribute Flags octet is set 579 to 0, the third octet of the Path Attribute contains the length 580 of the attribute data in octets. 582 If the Extended Length bit of the Attribute Flags octet is set 583 to 1, then the third and the fourth octets of the path 584 attribute contain the length of the attribute data in octets. 586 The remaining octets of the Path Attribute represent the 587 attribute value and are interpreted according to the Attribute 588 Flags and the Attribute Type Code. The supported Attribute Type 589 Codes, their attribute values and uses are the following: 591 a) ORIGIN (Type Code 1): 593 ORIGIN is a well-known mandatory attribute that defines the 594 origin of the path information. The data octet can assume 595 the following values: 597 Value Meaning 599 0 IGP - Network Layer Reachability Information 600 is interior to the originating AS 602 1 EGP - Network Layer Reachability Information 603 learned via EGP 605 2 INCOMPLETE - Network Layer Reachability 606 Information learned by some other means 608 Its usage is defined in 5.1.1 610 b) AS_PATH (Type Code 2): 612 AS_PATH is a well-known mandatory attribute that is composed 613 of a sequence of AS path segments. Each AS path segment is 614 represented by a triple . 617 The path segment type is a 1-octet long field with the 618 following values defined: 620 Value Segment Type 622 1 AS_SET: unordered set of ASs a route in the 623 UPDATE message has traversed 625 2 AS_SEQUENCE: ordered set of ASs a route in 626 the UPDATE message has traversed 628 The path segment length is a 1-octet long field containing 629 the number of ASs in the path segment value field. 631 The path segment value field contains one or more AS 632 numbers, each encoded as a 2-octets long field. 634 Usage of this attribute is defined in 5.1.2. 636 c) NEXT_HOP (Type Code 3): 638 This is a well-known mandatory attribute that defines the IP 639 address of the border router that should be used as the next 640 hop to the destinations listed in the Network Layer 641 Reachability field of the UPDATE message. 643 Usage of this attribute is defined in 5.1.3. 645 d) MULTI_EXIT_DISC (Type Code 4): 647 This is an optional non-transitive attribute that is a four 648 octet non-negative integer. The value of this attribute may 649 be used by a BGP speaker's decision process to discriminate 650 among multiple exit points to a neighboring autonomous 651 system. 653 Its usage is defined in 5.1.4. 655 e) LOCAL_PREF (Type Code 5): 657 LOCAL_PREF is a well-known discretionary attribute that is a 658 four octet non-negative integer. It is used by a BGP speaker 659 to inform other BGP speakers in its own autonomous system of 660 the originating speaker's degree of preference for an 661 advertised route. Usage of this attribute is described in 662 5.1.5. 664 f) ATOMIC_AGGREGATE (Type Code 6) 666 ATOMIC_AGGREGATE is a well-known discretionary attribute of 667 length 0. It is used by a BGP speaker to inform other BGP 668 speakers that the local system selected a less specific 669 route without selecting a more specific route which is 670 included in it. Usage of this attribute is described in 671 5.1.6. 673 g) AGGREGATOR (Type Code 7) 675 AGGREGATOR is an optional transitive attribute of length 6. 676 The attribute contains the last AS number that formed the 677 aggregate route (encoded as 2 octets), followed by the IP 678 address of the BGP speaker that formed the aggregate route 679 (encoded as 4 octets). Usage of this attribute is described 680 in 5.1.7 682 Network Layer Reachability Information: 684 This variable length field contains a list of IP address 685 prefixes. The length in octets of the Network Layer 686 Reachability Information is not encoded explicitly, but can be 687 calculated as: 689 UPDATE message Length - 23 - Total Path Attributes Length - 690 Unfeasible Routes Length 692 where UPDATE message Length is the value encoded in the fixed- 693 size BGP header, Total Path Attribute Length and Unfeasible 694 Routes Length are the values encoded in the variable part of 695 the UPDATE message, and 23 is a combined length of the fixed- 696 size BGP header, the Total Path Attribute Length field and the 697 Unfeasible Routes Length field. 699 Reachability information is encoded as one or more 2-tuples of 700 the form , whose fields are described below: 702 +---------------------------+ 703 | Length (1 octet) | 704 +---------------------------+ 705 | Prefix (variable) | 706 +---------------------------+ 708 The use and the meaning of these fields are as follows: 710 a) Length: 712 The Length field indicates the length in bits of the IP 713 address prefix. A length of zero indicates a prefix that 714 matches all IP addresses (with prefix, itself, of zero 715 octets). 717 b) Prefix: 719 The Prefix field contains IP address prefixes followed by 720 enough trailing bits to make the end of the field fall on an 721 octet boundary. Note that the value of the trailing bits is 722 irrelevant. 724 The minimum length of the UPDATE message is 23 octets -- 19 octets 725 for the fixed header + 2 octets for the Unfeasible Routes Length + 2 726 octets for the Total Path Attribute Length (the value of Unfeasible 727 Routes Length is 0 and the value of Total Path Attribute Length is 728 0). 730 An UPDATE message can advertise at most one route, which may be 731 described by several path attributes. All path attributes contained 732 in a given UPDATE messages apply to the destinations carried in the 733 Network Layer Reachability Information field of the UPDATE message. 735 An UPDATE message can list multiple routes to be withdrawn from 736 service. Each such route is identified by its destination (expressed 737 as an IP prefix), which unambiguously identifies the route in the 738 context of the BGP speaker - BGP speaker connection to which it has 739 been previously been advertised. 741 An UPDATE message may advertise only routes to be withdrawn from 742 service, in which case it will not include path attributes or Network 743 Layer Reachability Information. Conversely, it may advertise only a 744 feasible route, in which case the WITHDRAWN ROUTES field need not be 745 present. 747 4.4 KEEPALIVE Message Format 749 BGP does not use any transport protocol-based keep-alive mechanism to 750 determine if peers are reachable. Instead, KEEPALIVE messages are 751 exchanged between peers often enough as not to cause the Hold Timer 752 to expire. A reasonable maximum time between KEEPALIVE messages 753 would be one third of the Hold Time interval. KEEPALIVE messages 754 MUST NOT be sent more frequently than one per second. An 755 implementation MAY adjust the rate at which it sends KEEPALIVE 756 messages as a function of the Hold Time interval. 758 If the negotiated Hold Time interval is zero, then periodic KEEPALIVE 759 messages MUST NOT be sent. 761 KEEPALIVE message consists of only message header and has a length of 762 19 octets. 764 4.5 NOTIFICATION Message Format 766 A NOTIFICATION message is sent when an error condition is detected. 767 The BGP connection is closed immediately after sending it. 769 In addition to the fixed-size BGP header, the NOTIFICATION message 770 contains the following fields: 772 0 1 2 3 773 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 774 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 775 | Error code | Error subcode | Data | 776 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 777 | | 778 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 780 Error Code: 782 This 1-octet unsigned integer indicates the type of 783 NOTIFICATION. The following Error Codes have been defined: 785 Error Code Symbolic Name Reference 787 1 Message Header Error Section 6.1 789 2 OPEN Message Error Section 6.2 791 3 UPDATE Message Error Section 6.3 793 4 Hold Timer Expired Section 6.5 795 5 Finite State Machine Error Section 6.6 797 6 Cease Section 6.7 799 Error subcode: 801 This 1-octet unsigned integer provides more specific 802 information about the nature of the reported error. Each Error 803 Code may have one or more Error Subcodes associated with it. 804 If no appropriate Error Subcode is defined, then a zero 805 (Unspecific) value is used for the Error Subcode field. 807 Message Header Error subcodes: 809 1 - Connection Not Synchronized. 810 2 - Bad Message Length. 811 3 - Bad Message Type. 813 OPEN Message Error subcodes: 815 1 - Unsupported Version Number. 816 2 - Bad Peer AS. 817 3 - Bad BGP Identifier. ' 818 4 - Unsupported Optional Parameter. 819 5 - Authentication Failure. 820 6 - Unacceptable Hold Time. 822 UPDATE Message Error subcodes: 824 1 - Malformed Attribute List. 825 2 - Unrecognized Well-known Attribute. 826 3 - Missing Well-known Attribute. 827 4 - Attribute Flags Error. 828 5 - Attribute Length Error. 829 6 - Invalid ORIGIN Attribute 830 7 - AS Routing Loop. 831 8 - Invalid NEXT_HOP Attribute. 832 9 - Optional Attribute Error. 833 10 - Invalid Network Field. 834 11 - Malformed AS_PATH. 836 Data: 838 This variable-length field is used to diagnose the reason for 839 the NOTIFICATION. The contents of the Data field depend upon 840 the Error Code and Error Subcode. See Section 6 below for more 841 details. 843 Note that the length of the Data field can be determined from 844 the message Length field by the formula: 846 Message Length = 21 + Data Length 848 The minimum length of the NOTIFICATION message is 21 octets 849 (including message header). 851 5. Path Attributes 853 This section discusses the path attributes of the UPDATE message. 855 Path attributes fall into four separate categories: 857 1. Well-known mandatory. 858 2. Well-known discretionary. 859 3. Optional transitive. 860 4. Optional non-transitive. 862 Well-known attributes must be recognized by all BGP implementations. 863 Some of these attributes are mandatory and must be included in every 864 UPDATE message. Others are discretionary and may or may not be sent 865 in a particular UPDATE message. 867 All well-known attributes must be passed along (after proper 868 updating, if necessary) to other BGP peers. 870 In addition to well-known attributes, each path may contain one or 871 more optional attributes. It is not required or expected that all 872 BGP implementations support all optional attributes. The handling of 873 an unrecognized optional attribute is determined by the setting of 874 the Transitive bit in the attribute flags octet. Paths with 875 unrecognized transitive optional attributes should be accepted. If a 876 path with unrecognized transitive optional attribute is accepted and 877 passed along to other BGP peers, then the unrecognized transitive 878 optional attribute of that path must be passed along with the path to 879 other BGP peers with the Partial bit in the Attribute Flags octet set 880 to 1. If a path with recognized transitive optional attribute is 881 accepted and passed along to other BGP peers and the Partial bit in 882 the Attribute Flags octet is set to 1 by some previous AS, it is not 883 set back to 0 by the current AS. Unrecognized non-transitive optional 884 attributes must be quietly ignored and not passed along to other BGP 885 peers. 887 New transitive optional attributes may be attached to the path by the 888 originator or by any other AS in the path. If they are not attached 889 by the originator, the Partial bit in the Attribute Flags octet is 890 set to 1. The rules for attaching new non-transitive optional 891 attributes will depend on the nature of the specific attribute. The 892 documentation of each new non-transitive optional attribute will be 893 expected to include such rules. (The description of the 894 MULTI_EXIT_DISC attribute gives an example.) All optional attributes 895 (both transitive and non-transitive) may be updated (if appropriate) 896 by ASs in the path. 898 The sender of an UPDATE message should order path attributes within 899 the UPDATE message in ascending order of attribute type. The 900 receiver of an UPDATE message must be prepared to handle path 901 attributes within the UPDATE message that are out of order. 903 The same attribute cannot appear more than once within the Path 904 Attributes field of a particular UPDATE message. 906 5.1 Path Attribute Usage 908 The usage of each BGP path attributes is described in the following 909 clauses. 911 5.1.1 ORIGIN 913 ORIGIN is a well-known mandatory attribute. The ORIGIN attribute 914 shall be generated by the autonomous system that originates the 915 associated routing information. It shall be included in the UPDATE 916 messages of all BGP speakers that choose to propagate this 917 information to other BGP speakers. 919 5.1.2 AS_PATH 921 AS_PATH is a well-known mandatory attribute. This attribute 922 identifies the autonomous systems through which routing information 923 carried in this UPDATE message has passed. The components of this 924 list can be AS_SETs or AS_SEQUENCEs. 926 When a BGP speaker propagates a route which it has learned from 927 another BGP speaker's UPDATE message, it shall modify the route's 928 AS_PATH attribute based on the location of the BGP speaker to which 929 the route will be sent: 931 a) When a given BGP speaker advertises the route to another BGP 932 speaker located in its own autonomous system, the advertising 933 speaker shall not modify the AS_PATH attribute associated with the 934 route. 936 b) When a given BGP speaker advertises the route to a BGP speaker 937 located in a neighboring autonomous system, then the advertising 938 speaker shall update the AS_PATH attribute as follows: 940 1) if the first path segment of the AS_PATH is of type 941 AS_SEQUENCE, the local system shall prepend its own AS number 942 as the last element of the sequence (put it in the leftmost 943 position) 945 2) if the first path segment of the AS_PATH is of type AS_SET, 946 the local system shall prepend a new path segment of type 947 AS_SEQUENCE to the AS_PATH, including its own AS number in that 948 segment. 950 When a BGP speaker originates a route then: 952 a) the originating speaker shall include its own AS number in 953 the AS_PATH attribute of all UPDATE messages sent to BGP 954 speakers located in neighboring autonomous systems. (In this 955 case, the AS number of the originating speaker's autonomous 956 system will be the only entry in the AS_PATH attribute). 958 b) the originating speaker shall include an empty AS_PATH 959 attribute in all UPDATE messages sent to BGP speakers located 960 in its own autonomous system. (An empty AS_PATH attribute is 961 one whose length field contains the value zero). 963 5.1.3 NEXT_HOP 965 The NEXT_HOP path attribute defines the IP address of the border 966 router that should be used as the next hop to the destinations listed 967 in the UPDATE message. If a border router belongs to the same AS as 968 its peer, then the peer is an internal border router. Otherwise, it 969 is an external border router. A BGP speaker can advertise any 970 internal border router as the next hop provided that the interface 971 associated with the IP address of this border router (as specified in 972 the NEXT_HOP path attribute) shares a common subnet with both the 973 local and remote BGP speakers. A BGP speaker can advertise any 974 external border router as the next hop, provided that the IP address 975 of this border router was learned from one of the BGP speaker's 976 peers, and the interface associated with the IP address of this 977 border router (as specified in the NEXT_HOP path attribute) shares a 978 common subnet with the local and remote BGP speakers. A BGP speaker 979 needs to be able to support disabling advertisement of external 980 border routers. 982 A BGP speaker must never advertise an address of a peer to that peer 983 as a NEXT_HOP, for a route that the speaker is originating. A BGP 984 speaker must never install a route with itself as the next hop. 986 When a BGP speaker advertises the route to a BGP speaker located in 987 its own autonomous system, the advertising speaker shall not modify 988 the NEXT_HOP attribute associated with the route. When a BGP speaker 989 receives the route via an internal link, it may forward packets to 990 the NEXT_HOP address if the address contained in the attribute is on 991 a common subnet with the local and remote BGP speakers. 993 5.1.4 MULTI_EXIT_DISC 995 The MULTI_EXIT_DISC attribute may be used on external (inter-AS) 996 links to discriminate among multiple exit or entry points to the same 997 neighboring AS. The value of the MULTI_EXIT_DISC attribute is a four 998 octet unsigned number which is called a metric. All other factors 999 being equal, the exit or entry point with lower metric should be 1000 preferred. If received over external links, the MULTI_EXIT_DISC 1001 attribute may be propagated over internal links to other BGP speakers 1002 within the same AS. The MULTI_EXIT_DISC attribute is never 1003 propagated to other BGP speakers in neighboring AS's. 1005 5.1.5 LOCAL_PREF 1007 LOCAL_PREF is a well-known discretionary attribute that shall be 1008 included in all UPDATE messages that a given BGP speaker sends to the 1009 other BGP speakers located in its own autonomous system. A BGP 1010 speaker shall calculate the degree of preference for each external 1011 route and include the degree of preference when advertising a route 1012 to its internal peers. The higher degree of preference should be 1013 preferred. A BGP speaker shall use the degree of preference learned 1014 via LOCAL_PREF in its decision process (see section 9.1.1). 1016 A BGP speaker shall not include this attribute in UPDATE messages 1017 that it sends to BGP speakers located in a neighboring autonomous 1018 system. If it is contained in an UPDATE message that is received from 1019 a BGP speaker which is not located in the same autonomous system as 1020 the receiving speaker, then this attribute shall be ignored by the 1021 receiving speaker. 1023 5.1.6 ATOMIC_AGGREGATE 1025 ATOMIC_AGGREGATE is a well-known discretionary attribute. If a BGP 1026 speaker, when presented with a set of overlapping routes from one of 1027 its peers (see 9.1.4), selects the less specific route without 1028 selecting the more specific one, then the local system shall attach 1029 the ATOMIC_AGGREGATE attribute to the route when propagating it to 1030 other BGP speakers (if that attribute is not already present in the 1031 received less specific route). A BGP speaker that receives a route 1032 with the ATOMIC_AGGREGATE attribute shall not remove the attribute 1033 from the route when propagating it to other speakers. A BGP speaker 1034 that receives a route with the ATOMIC_AGGREGATE attribute shall not 1035 make any NLRI of that route more specific (as defined in 9.1.4) when 1036 advertising this route to other BGP speakers. A BGP speaker that 1037 receives a route with the ATOMIC_AGGREGATE attribute needs to be 1038 cognizant of the fact that the actual path to destinations, as 1039 specified in the NLRI of the route, while having the loop-free 1040 property, may traverse ASs that are not listed in the AS_PATH 1041 attribute. 1043 5.1.7 AGGREGATOR 1045 AGGREGATOR is an optional transitive attribute which may be included 1046 in updates which are formed by aggregation (see Section 9.2.4.2). A 1047 BGP speaker which performs route aggregation may add the AGGREGATOR 1048 attribute which shall contain its own AS number and IP address. 1050 6. BGP Error Handling. 1052 This section describes actions to be taken when errors are detected 1053 while processing BGP messages. 1055 When any of the conditions described here are detected, a 1056 NOTIFICATION message with the indicated Error Code, Error Subcode, 1057 and Data fields is sent, and the BGP connection is closed. If no 1058 Error Subcode is specified, then a zero must be used. 1060 The phrase "the BGP connection is closed" means that the transport 1061 protocol connection has been closed and that all resources for that 1062 BGP connection have been deallocated. Routing table entries 1063 associated with the remote peer are marked as invalid. The fact that 1064 the routes have become invalid is passed to other BGP peers before 1065 the routes are deleted from the system. 1067 Unless specified explicitly, the Data field of the NOTIFICATION 1068 message that is sent to indicate an error is empty. 1070 6.1 Message Header error handling. 1072 All errors detected while processing the Message Header are indicated 1073 by sending the NOTIFICATION message with Error Code Message Header 1074 Error. The Error Subcode elaborates on the specific nature of the 1075 error. 1077 The expected value of the Marker field of the message header is all 1078 ones if the message type is OPEN. The expected value of the Marker 1079 field for all other types of BGP messages determined based on the 1080 presence of the Authentication Information Optional Parameter in the 1081 BGP OPEN message and the actual authentication mechanism (if the 1082 Authentication Information in the BGP OPEN message is present). If 1083 the Marker field of the message header is not the expected one, then 1084 a synchronization error has occurred and the Error Subcode is set to 1085 Connection Not Synchronized. 1087 If the Length field of the message header is less than 19 or greater 1088 than 4096, or if the Length field of an OPEN message is less than 1089 the minimum length of the OPEN message, or if the Length field of an 1090 UPDATE message is less than the minimum length of the UPDATE message, 1091 or if the Length field of a KEEPALIVE message is not equal to 19, or 1092 if the Length field of a NOTIFICATION message is less than the 1093 minimum length of the NOTIFICATION message, then the Error Subcode is 1094 set to Bad Message Length. The Data field contains the erroneous 1095 Length field. 1097 If the Type field of the message header is not recognized, then the 1098 Error Subcode is set to Bad Message Type. The Data field contains 1099 the erroneous Type field. 1101 6.2 OPEN message error handling. 1103 All errors detected while processing the OPEN message are indicated 1104 by sending the NOTIFICATION message with Error Code OPEN Message 1105 Error. The Error Subcode elaborates on the specific nature of the 1106 error. 1108 If the version number contained in the Version field of the received 1109 OPEN message is not supported, then the Error Subcode is set to 1110 Unsupported Version Number. The Data field is a 2-octet unsigned 1111 integer, which indicates the largest locally supported version number 1112 less than the version the remote BGP peer bid (as indicated in the 1113 received OPEN message). 1115 If the Autonomous System field of the OPEN message is unacceptable, 1116 then the Error Subcode is set to Bad Peer AS. The determination of 1117 acceptable Autonomous System numbers is outside the scope of this 1118 protocol. 1120 If the Hold Time field of the OPEN message is unacceptable, then the 1121 Error Subcode MUST be set to Unacceptable Hold Time. An 1122 implementation MUST reject Hold Time values of one or two seconds. 1123 An implementation MAY reject any proposed Hold Time. An 1124 implementation which accepts a Hold Time MUST use the negotiated 1125 value for the Hold Time. 1127 If the BGP Identifier field of the OPEN message is syntactically 1128 incorrect, then the Error Subcode is set to Bad BGP Identifier. 1129 Syntactic correctness means that the BGP Identifier field represents 1130 a valid IP host address. 1132 If one of the Optional Parameters in the OPEN message is not 1133 recognized, then the Error Subcode is set to Unsupported Optional 1134 Parameters. 1136 If the OPEN message carries Authentication Information (as an 1137 Optional Parameter), then the corresponding authentication procedure 1138 is invoked. If the authentication procedure (based on Authentication 1139 Code and Authentication Data) fails, then the Error Subcode is set to 1140 Authentication Failure. 1142 If the OPEN message carries any other Optional Parameter (other than 1143 Authentication Information), and the local system doesn't recognize 1144 the Parameter, the Parameter shall be ignored. 1146 6.3 UPDATE message error handling. 1148 All errors detected while processing the UPDATE message are indicated 1149 by sending the NOTIFICATION message with Error Code UPDATE Message 1150 Error. The error subcode elaborates on the specific nature of the 1151 error. 1153 Error checking of an UPDATE message begins by examining the path 1154 attributes. If the Unfeasible Routes Length or Total Attribute 1155 Length is too large (i.e., if Unfeasible Routes Length + Total 1156 Attribute Length + 23 exceeds the message Length), then the Error 1157 Subcode is set to Malformed Attribute List. 1159 If any recognized attribute has Attribute Flags that conflict with 1160 the Attribute Type Code, then the Error Subcode is set to Attribute 1161 Flags Error. The Data field contains the erroneous attribute (type, 1162 length and value). 1164 If any recognized attribute has Attribute Length that conflicts with 1165 the expected length (based on the attribute type code), then the 1166 Error Subcode is set to Attribute Length Error. The Data field 1167 contains the erroneous attribute (type, length and value). 1169 If any of the mandatory well-known attributes are not present, then 1170 the Error Subcode is set to Missing Well-known Attribute. The Data 1171 field contains the Attribute Type Code of the missing well-known 1172 attribute. 1174 If any of the mandatory well-known attributes are not recognized, 1175 then the Error Subcode is set to Unrecognized Well-known Attribute. 1176 The Data field contains the unrecognized attribute (type, length and 1177 value). 1179 If the ORIGIN attribute has an undefined value, then the Error 1180 Subcode is set to Invalid Origin Attribute. The Data field contains 1181 the unrecognized attribute (type, length and value). 1183 If the NEXT_HOP attribute field is syntactically incorrect, then the 1184 Error Subcode is set to Invalid NEXT_HOP Attribute. The Data field 1185 contains the incorrect attribute (type, length and value). Syntactic 1186 correctness means that the NEXT_HOP attribute represents a valid IP 1187 host address. Semantic correctness applies only to the external BGP 1188 links. It means that the interface associated with the IP address, as 1189 specified in the NEXT_HOP attribute, shares a common subnet with the 1190 receiving BGP speaker and is not the IP address of the receiving BGP 1191 speaker. If the NEXT_HOP attribute is semantically incorrect, the 1192 error should be logged, and the the route should be ignored. In this 1193 case, no NOTIFICATION message should be sent. 1195 The AS_PATH attribute is checked for syntactic correctness. If the 1196 path is syntactically incorrect, then the Error Subcode is set to 1197 Malformed AS_PATH. 1199 The information carried by the AS_PATH attribute is checked for AS 1200 loops. AS loop detection is done by scanning the full AS path (as 1201 specified in the AS_PATH attribute), and checking that the autonomous 1202 system number of the local system does not appear in the AS path. If 1203 the autonomous system number appears in the AS path the route may be 1204 stored in the Adj-RIB-In, but unless the router is configured to 1205 accept routes with its own autonomous system in the AS path, the 1206 route shall not be passed to the BGP Decision Process. Operations of 1207 a router that is configured to accept routes with its own autonomous 1208 system number in the AS path are outside the scope of this document. 1210 If an optional attribute is recognized, then the value of this 1211 attribute is checked. If an error is detected, the attribute is 1212 discarded, and the Error Subcode is set to Optional Attribute Error. 1213 The Data field contains the attribute (type, length and value). 1215 If any attribute appears more than once in the UPDATE message, then 1216 the Error Subcode is set to Malformed Attribute List. 1218 The NLRI field in the UPDATE message is checked for syntactic 1219 validity. If the field is syntactically incorrect, then the Error 1220 Subcode is set to Invalid Network Field. 1222 6.4 NOTIFICATION message error handling. 1224 If a peer sends a NOTIFICATION message, and there is an error in that 1225 message, there is unfortunately no means of reporting this error via 1226 a subsequent NOTIFICATION message. Any such error, such as an 1227 unrecognized Error Code or Error Subcode, should be noticed, logged 1228 locally, and brought to the attention of the administration of the 1229 peer. The means to do this, however, lies outside the scope of this 1230 document. 1232 6.5 Hold Timer Expired error handling. 1234 If a system does not receive successive KEEPALIVE and/or UPDATE 1235 and/or NOTIFICATION messages within the period specified in the Hold 1236 Time field of the OPEN message, then the NOTIFICATION message with 1237 Hold Timer Expired Error Code must be sent and the BGP connection 1238 closed. 1240 6.6 Finite State Machine error handling. 1242 Any error detected by the BGP Finite State Machine (e.g., receipt of 1243 an unexpected event) is indicated by sending the NOTIFICATION message 1244 with Error Code Finite State Machine Error. 1246 6.7 Cease. 1248 In absence of any fatal errors (that are indicated in this section), 1249 a BGP peer may choose at any given time to close its BGP connection 1250 by sending the NOTIFICATION message with Error Code Cease. However, 1251 the Cease NOTIFICATION message must not be used when a fatal error 1252 indicated by this section does exist. 1254 6.8 Connection collision detection. 1256 If a pair of BGP speakers try simultaneously to establish a TCP 1257 connection to each other, then two parallel connections between this 1258 pair of speakers might well be formed. We refer to this situation as 1259 connection collision. Clearly, one of these connections must be 1260 closed. 1262 Based on the value of the BGP Identifier a convention is established 1263 for detecting which BGP connection is to be preserved when a 1264 collision does occur. The convention is to compare the BGP 1265 Identifiers of the peers involved in the collision and to retain only 1266 the connection initiated by the BGP speaker with the higher-valued 1267 BGP Identifier. 1269 Upon receipt of an OPEN message, the local system must examine all of 1270 its connections that are in the OpenConfirm state. A BGP speaker may 1271 also examine connections in an OpenSent state if it knows the BGP 1272 Identifier of the peer by means outside of the protocol. If among 1273 these connections there is a connection to a remote BGP speaker whose 1274 BGP Identifier equals the one in the OPEN message, then the local 1275 system performs the following collision resolution procedure: 1277 1. The BGP Identifier of the local system is compared to the BGP 1278 Identifier of the remote system (as specified in the OPEN 1279 message). 1281 2. If the value of the local BGP Identifier is less than the 1282 remote one, the local system closes BGP connection that already 1283 exists (the one that is already in the OpenConfirm state), and 1284 accepts BGP connection initiated by the remote system. 1286 3. Otherwise, the local system closes newly created BGP connection 1287 (the one associated with the newly received OPEN message), and 1288 continues to use the existing one (the one that is already in the 1289 OpenConfirm state). 1291 Comparing BGP Identifiers is done by treating them as (4-octet 1292 long) unsigned integers. 1294 A connection collision with an existing BGP connection that is in 1295 Established states causes unconditional closing of the newly 1296 created connection. Note that a connection collision cannot be 1297 detected with connections that are in Idle, or Connect, or Active 1298 states. 1300 Closing the BGP connection (that results from the collision 1301 resolution procedure) is accomplished by sending the NOTIFICATION 1302 message with the Error Code Cease. 1304 7. BGP Version Negotiation. 1306 BGP speakers may negotiate the version of the protocol by making 1307 multiple attempts to open a BGP connection, starting with the highest 1308 version number each supports. If an open attempt fails with an Error 1309 Code OPEN Message Error, and an Error Subcode Unsupported Version 1310 Number, then the BGP speaker has available the version number it 1311 tried, the version number its peer tried, the version number passed 1312 by its peer in the NOTIFICATION message, and the version numbers that 1313 it supports. If the two peers do support one or more common 1314 versions, then this will allow them to rapidly determine the highest 1315 common version. In order to support BGP version negotiation, future 1316 versions of BGP must retain the format of the OPEN and NOTIFICATION 1317 messages. 1319 8. BGP Finite State machine. 1321 This section specifies BGP operation in terms of a Finite State 1322 Machine (FSM). Following is a brief summary and overview of BGP 1323 operations by state as determined by this FSM. A condensed version 1324 of the BGP FSM is found in Appendix 1. 1326 Initially BGP is in the Idle state. 1328 Idle state: 1330 In this state BGP refuses all incoming BGP connections. No 1331 resources are allocated to the peer. In response to the Start 1332 event (initiated by either system or operator) the local system 1333 initializes all BGP resources, starts the ConnectRetry timer, 1334 initiates a transport connection to other BGP peer, while 1335 listening for connection that may be initiated by the remote 1336 BGP peer, and changes its state to Connect. The exact value of 1337 the ConnectRetry timer is a local matter, but should be 1338 sufficiently large to allow TCP initialization. 1340 If a BGP speaker detects an error, it shuts down the connection 1341 and changes its state to Idle. Getting out of the Idle state 1342 requires generation of the Start event. If such an event is 1343 generated automatically, then persistent BGP errors may result 1344 in persistent flapping of the speaker. To avoid such a 1345 condition it is recommended that Start events should not be 1346 generated immediately for a peer that was previously 1347 transitioned to Idle due to an error. For a peer that was 1348 previously transitioned to Idle due to an error, the time 1349 between consecutive generation of Start events, if such events 1350 are generated automatically, shall exponentially increase. The 1351 value of the initial timer shall be 60 seconds. The time shall 1352 be doubled for each consecutive retry. 1354 Any other event received in the Idle state is ignored. 1356 Connect state: 1358 In this state BGP is waiting for the transport protocol 1359 connection to be completed. 1361 If the transport protocol connection succeeds, the local system 1362 clears the ConnectRetry timer, completes initialization, sends 1363 an OPEN message to its peer, and changes its state to OpenSent. 1365 If the transport protocol connect fails (e.g., retransmission 1366 timeout), the local system restarts the ConnectRetry timer, 1367 continues to listen for a connection that may be initiated by 1368 the remote BGP peer, and changes its state to Active state. 1370 In response to the ConnectRetry timer expired event, the local 1371 system restarts the ConnectRetry timer, initiates a transport 1372 connection to other BGP peer, continues to listen for a 1373 connection that may be initiated by the remote BGP peer, and 1374 stays in the Connect state. 1376 Start event is ignored in the Active state. 1378 In response to any other event (initiated by either system or 1379 operator), the local system releases all BGP resources 1380 associated with this connection and changes its state to Idle. 1382 Active state: 1384 In this state BGP is trying to acquire a peer by initiating a 1385 transport protocol connection. 1387 If the transport protocol connection succeeds, the local system 1388 clears the ConnectRetry timer, completes initialization, sends 1389 an OPEN message to its peer, sets its Hold Timer to a large 1390 value, and changes its state to OpenSent. A Hold Timer value 1391 of 4 minutes is suggested. 1393 In response to the ConnectRetry timer expired event, the local 1394 system restarts the ConnectRetry timer, initiates a transport 1395 connection to other BGP peer, continues to listen for a 1396 connection that may be initiated by the remote BGP peer, and 1397 changes its state to Connect. 1399 If the local system detects that a remote peer is trying to 1400 establish BGP connection to it, and the IP address of the 1401 remote peer is not an expected one, the local system restarts 1402 the ConnectRetry timer, rejects the attempted connection, 1403 continues to listen for a connection that may be initiated by 1404 the remote BGP peer, and stays in the Active state. 1406 Start event is ignored in the Active state. 1408 In response to any other event (initiated by either system or 1409 operator), the local system releases all BGP resources 1410 associated with this connection and changes its state to Idle. 1412 OpenSent state: 1414 In this state BGP waits for an OPEN message from its peer. 1415 When an OPEN message is received, all fields are checked for 1416 correctness. If the BGP message header checking or OPEN 1417 message checking detects an error (see Section 6.2), or a 1418 connection collision (see Section 6.8) the local system sends a 1419 NOTIFICATION message and changes its state to Idle. 1421 If there are no errors in the OPEN message, BGP sends a 1422 KEEPALIVE message and sets a KeepAlive timer. The Hold Timer, 1423 which was originally set to a large value (see above), is 1424 replaced with the negotiated Hold Time value (see section 4.2). 1425 If the negotiated Hold Time value is zero, then the Hold Time 1426 timer and KeepAlive timers are not started. If the value of 1427 the Autonomous System field is the same as the local Autonomous 1428 System number, then the connection is an "internal" connection; 1429 otherwise, it is "external". (This will effect UPDATE 1430 processing as described below.) Finally, the state is changed 1431 to OpenConfirm. 1433 If a disconnect notification is received from the underlying 1434 transport protocol, the local system closes the BGP connection, 1435 restarts the ConnectRetry timer, while continue listening for 1436 connection that may be initiated by the remote BGP peer, and 1437 goes into the Active state. 1439 If the Hold Timer expires, the local system sends NOTIFICATION 1440 message with error code Hold Timer Expired and changes its 1441 state to Idle. 1443 In response to the Stop event (initiated by either system or 1444 operator) the local system sends NOTIFICATION message with 1445 Error Code Cease and changes its state to Idle. 1447 Start event is ignored in the OpenSent state. 1449 In response to any other event the local system sends 1450 NOTIFICATION message with Error Code Finite State Machine Error 1451 and changes its state to Idle. 1453 Whenever BGP changes its state from OpenSent to Idle, it closes 1454 the BGP (and transport-level) connection and releases all 1455 resources associated with that connection. 1457 OpenConfirm state: 1459 In this state BGP waits for a KEEPALIVE or NOTIFICATION 1460 message. 1462 If the local system receives a KEEPALIVE message, it changes 1463 its state to Established. 1465 If the Hold Timer expires before a KEEPALIVE message is 1466 received, the local system sends NOTIFICATION message with 1467 error code Hold Timer Expired and changes its state to Idle. 1469 If the local system receives a NOTIFICATION message, it changes 1470 its state to Idle. 1472 If the KeepAlive timer expires, the local system sends a 1473 KEEPALIVE message and restarts its KeepAlive timer. 1475 If a disconnect notification is received from the underlying 1476 transport protocol, the local system changes its state to Idle. 1478 In response to the Stop event (initiated by either system or 1479 operator) the local system sends NOTIFICATION message with 1480 Error Code Cease and changes its state to Idle. 1482 Start event is ignored in the OpenConfirm state. 1484 In response to any other event the local system sends 1485 NOTIFICATION message with Error Code Finite State Machine Error 1486 and changes its state to Idle. 1488 Whenever BGP changes its state from OpenConfirm to Idle, it 1489 closes the BGP (and transport-level) connection and releases 1490 all resources associated with that connection. 1492 Established state: 1494 In the Established state BGP can exchange UPDATE, NOTIFICATION, 1495 and KEEPALIVE messages with its peer. 1497 If the local system receives an UPDATE or KEEPALIVE message, it 1498 restarts its Hold Timer, if the negotiated Hold Time value is 1499 non-zero. 1501 If the local system receives a NOTIFICATION message, it changes 1502 its state to Idle. 1504 If the local system receives an UPDATE message and the UPDATE 1505 message error handling procedure (see Section 6.3) detects an 1506 error, the local system sends a NOTIFICATION message and 1507 changes its state to Idle. 1509 If a disconnect notification is received from the underlying 1510 transport protocol, the local system changes its state to Idle. 1512 If the Hold Timer expires, the local system sends a 1513 NOTIFICATION message with Error Code Hold Timer Expired and 1514 changes its state to Idle. 1516 If the KeepAlive timer expires, the local system sends a 1517 KEEPALIVE message and restarts its KeepAlive timer. 1519 Each time the local system sends a KEEPALIVE or UPDATE message, 1520 it restarts its KeepAlive timer, unless the negotiated Hold 1521 Time value is zero. 1523 In response to the Stop event (initiated by either system or 1524 operator), the local system sends a NOTIFICATION message with 1525 Error Code Cease and changes its state to Idle. 1527 Start event is ignored in the Established state. 1529 In response to any other event, the local system sends 1530 NOTIFICATION message with Error Code Finite State Machine Error 1531 and changes its state to Idle. 1533 Whenever BGP changes its state from Established to Idle, it 1534 closes the BGP (and transport-level) connection, releases all 1535 resources associated with that connection, and deletes all 1536 routes derived from that connection. 1538 9. UPDATE Message Handling 1540 An UPDATE message may be received only in the Established state. 1541 When an UPDATE message is received, each field is checked for 1542 validity as specified in Section 6.3. 1544 If an optional non-transitive attribute is unrecognized, it is 1545 quietly ignored. If an optional transitive attribute is 1546 unrecognized, the Partial bit (the third high-order bit) in the 1547 attribute flags octet is set to 1, and the attribute is retained for 1548 propagation to other BGP speakers. 1550 If an optional attribute is recognized, and has a valid value, then, 1551 depending on the type of the optional attribute, it is processed 1552 locally, retained, and updated, if necessary, for possible 1553 propagation to other BGP speakers. 1555 If the UPDATE message contains a non-empty WITHDRAWN ROUTES field, 1556 the previously advertised routes whose destinations (expressed as IP 1557 prefixes) contained in this field shall be removed from the Adj-RIB- 1558 In. This BGP speaker shall run its Decision Process since the 1559 previously advertised route is not longer available for use. 1561 If the UPDATE message contains a feasible route, it shall be placed 1562 in the appropriate Adj-RIB-In, and the following additional actions 1563 shall be taken: 1565 i) If its Network Layer Reachability Information (NLRI) is identical 1566 to the one of a route currently stored in the Adj-RIB-In, then the 1567 new route shall replace the older route in the Adj-RIB-In, thus 1568 implicitly withdrawing the older route from service. The BGP speaker 1569 shall run its Decision Process since the older route is no longer 1570 available for use. 1572 ii) If the new route is an overlapping route that is included (see 1573 9.1.4) in an earlier route contained in the Adj-RIB-In, the BGP 1574 speaker shall run its Decision Process since the more specific route 1575 has implicitly made a portion of the less specific route unavailable 1576 for use. 1578 iii) If the new route has identical path attributes to an earlier 1579 route contained in the Adj-RIB-In, and is more specific (see 9.1.4) 1580 than the earlier route, no further actions are necessary. 1582 iv) If the new route has NLRI that is not present in any of the 1583 routes currently stored in the Adj-RIB-In, then the new route shall 1584 be placed in the Adj-RIB-In. The BGP speaker shall run its Decision 1585 Process. 1587 v) If the new route is an overlapping route that is less specific 1588 (see 9.1.4) than an earlier route contained in the Adj-RIB-In, the 1589 BGP speaker shall run its Decision Process on the set of destinations 1590 described only by the less specific route. 1592 9.1 Decision Process 1594 The Decision Process selects routes for subsequent advertisement by 1595 applying the policies in the local Policy Information Base (PIB) to 1596 the routes stored in its Adj-RIB-In. The output of the Decision 1597 Process is the set of routes that will be advertised to all peers; 1598 the selected routes will be stored in the local speaker's Adj-RIB- 1599 Out. 1601 The selection process is formalized by defining a function that takes 1602 the attribute of a given route as an argument and returns a non- 1603 negative integer denoting the degree of preference for the route. 1604 The function that calculates the degree of preference for a given 1605 route shall not use as its inputs any of the following: the existence 1606 of other routes, the non-existence of other routes, or the path 1607 attributes of other routes. Route selection then consists of 1608 individual application of the degree of preference function to each 1609 feasible route, followed by the choice of the one with the highest 1610 degree of preference. 1612 The Decision Process operates on routes contained in each Adj-RIB-In, 1613 and is responsible for: 1615 - selection of routes to be advertised to BGP speakers located in 1616 the local speaker's autonomous system 1618 - selection of routes to be advertised to BGP speakers located in 1619 neighboring autonomous systems 1620 - route aggregation and route information reduction 1622 The Decision Process takes place in three distinct phases, each 1623 triggered by a different event: 1625 a) Phase 1 is responsible for calculating the degree of preference 1626 for each route received from a BGP speaker located in a 1627 neighboring autonomous system, and for advertising to the other 1628 BGP speakers in the local autonomous system the routes that have 1629 the highest degree of preference for each distinct destination. 1631 b) Phase 2 is invoked on completion of phase 1. It is responsible 1632 for choosing the best route out of all those available for each 1633 distinct destination, and for installing each chosen route into 1634 the appropriate Loc-RIB. 1636 c) Phase 3 is invoked after the Loc-RIB has been modified. It is 1637 responsible for disseminating routes in the Loc-RIB to each peer 1638 located in a neighboring autonomous system, according to the 1639 policies contained in the PIB. Route aggregation and information 1640 reduction can optionally be performed within this phase. 1642 9.1.1 Phase 1: Calculation of Degree of Preference 1644 The Phase 1 decision function shall be invoked whenever the local BGP 1645 speaker receives an UPDATE message from a peer located in a 1646 neighboring autonomous system that advertises a new route, a 1647 replacement route, or a withdrawn route. 1649 The Phase 1 decision function is a separate process which completes 1650 when it has no further work to do. 1652 The Phase 1 decision function shall lock an Adj-RIB-In prior to 1653 operating on any route contained within it, and shall unlock it after 1654 operating on all new or unfeasible routes contained within it. 1656 For each newly received or replacement feasible route, the local BGP 1657 speaker shall determine a degree of preference. If the route is 1658 learned from a BGP speaker in the local autonomous system, either the 1659 value of the LOCAL_PREF attribute shall be taken as the degree of 1660 preference, or the local system shall compute the degree of 1661 preference of the route based on preconfigured policy information. If 1662 the route is learned from a BGP speaker in a neighboring autonomous 1663 system, then the degree of preference shall be computed based on 1664 preconfigured policy information. The exact nature of this policy 1665 information and the computation involved is a local matter. The 1666 local speaker shall then run the internal update process of 9.2.1 to 1667 select and advertise the most preferable route. 1669 9.1.2 Phase 2: Route Selection 1671 The Phase 2 decision function shall be invoked on completion of Phase 1672 1. The Phase 2 function is a separate process which completes when 1673 it has no further work to do. The Phase 2 process shall consider all 1674 routes that are present in the Adj-RIBs-In, including those received 1675 from BGP speakers located in its own autonomous system and those 1676 received from BGP speakers located in neighboring autonomous systems. 1678 The Phase 2 decision function shall be blocked from running while the 1679 Phase 3 decision function is in process. The Phase 2 function shall 1680 lock all Adj-RIBs-In prior to commencing its function, and shall 1681 unlock them on completion. 1683 If the NEXT_HOP attribute of a BGP route depicts an address to which 1684 the local BGP speaker doesn't have a route in its Loc-RIB, the BGP 1685 route SHOULD be excluded from the Phase 2 decision function. 1687 For each set of destinations for which a feasible route exists in the 1688 Adj-RIBs-In, the local BGP speaker shall identify the route that has: 1690 a) the highest degree of preference of any route to the same set 1691 of destinations, or 1693 b) is the only route to that destination, or 1695 c) is selected as a result of the Phase 2 tie breaking rules 1696 specified in 9.1.2.1. 1698 The local speaker SHALL then install that route in the Loc-RIB, 1699 replacing any route to the same destination that is currently being 1700 held in the Loc-RIB. The local speaker MUST determine the immediate 1701 next hop to the address depicted by the NEXT_HOP attribute of the 1702 selected route by performing a lookup in the IGP and selecting one of 1703 the possible paths in the IGP. This immediate next hop MUST be used 1704 when installing the selected route in the Loc-RIB. If the route to 1705 the address depicted by the NEXT_HOP attribute changes such that the 1706 immediate next hop changes, route selection should be recalculated as 1707 specified above. 1709 Unfeasible routes shall be removed from the Loc-RIB, and 1710 corresponding unfeasible routes shall then be removed from the Adj- 1711 RIBs-In. 1713 9.1.2.1 Breaking Ties (Phase 2) 1715 In its Adj-RIBs-In a BGP speaker may have several routes to the same 1716 destination that have the same degree of preference. The local 1717 speaker can select only one of these routes for inclusion in the 1718 associated Loc-RIB. The local speaker considers all equally 1719 preferable routes, both those received from BGP speakers located in 1720 neighboring autonomous systems, and those received from other BGP 1721 speakers located in the local speaker's autonomous system. 1723 The following tie-breaking procedure assumes that for each candidate 1724 route all the BGP speakers within an autonomous system can ascertain 1725 the cost of a path (interior distance) to the address depicted by the 1726 NEXT_HOP attribute of the route. Ties shall be broken according to 1727 the following algorithm: 1729 a) If the local system is configured to take into account 1730 MULTI_EXIT_DISC, and the candidate routes differ in their 1731 MULTI_EXIT_DISC attribute, select the route that has the lowest 1732 value of the MULTI_EXIT_DISC attribute. A route with 1733 MULTI_EXIT_DISC shall be preferred to a route without 1734 MULTI_EXIT_DIST. 1736 b) Otherwise, select the route that has the lowest cost (interior 1737 distance) to the entity depicted by the NEXT_HOP attribute of the 1738 route. If there are several routes with the same cost, then the 1739 tie-breaking shall be broken as follows: 1741 - if at least one of the candidate routes was advertised by the 1742 BGP speaker in a neighboring autonomous system, select the 1743 route that was advertised by the BGP speaker in a neighboring 1744 autonomous system whose BGP Identifier has the lowest value 1745 among all other BGP speakers in neighboring autonomous systems; 1747 - otherwise, select the route that was advertised by the BGP 1748 speaker whose BGP Identifier has the lowest value. 1750 9.1.3 Phase 3: Route Dissemination 1752 The Phase 3 decision function shall be invoked on completion of Phase 1753 2, or when any of the following events occur: 1755 a) when routes in a Loc-RIB to local destinations have changed 1757 b) when locally generated routes learned by means outside of BGP 1758 have changed 1760 c) when a new BGP speaker - BGP speaker connection has been 1761 established 1763 The Phase 3 function is a separate process which completes when it 1764 has no further work to do. The Phase 3 Routing Decision function 1765 shall be blocked from running while the Phase 2 decision function is 1766 in process. 1768 All routes in the Loc-RIB shall be processed into a corresponding 1769 entry in the associated Adj-RIBs-Out. Route aggregation and 1770 information reduction techniques (see 9.2.4.1) may optionally be 1771 applied. 1773 For the benefit of future support of inter-AS multicast capabilities, 1774 a BGP speaker that participates in inter-AS multicast routing shall 1775 advertise a route it receives from one of its external peers and if 1776 it installs it in its Loc-RIB, it shall advertise it back to the peer 1777 from which the route was received. For a BGP speaker that does not 1778 participate in inter-AS multicast routing such an advertisement is 1779 optional. When doing such an advertisement, the NEXT_HOP attribute 1780 should be set to the address of the peer. An implementation may also 1781 optimize such an advertisement by truncating information in the 1782 AS_PATH attribute to include only its own AS number and that of the 1783 peer that advertised the route (such truncation requires the ORIGIN 1784 attribute to be set to INCOMPLETE). In addition an implementation is 1785 not required to pass optional or discretionary path attributes with 1786 such an advertisement. 1788 When the updating of the Adj-RIBs-Out and the Forwarding Information 1789 Base (FIB) is complete, the local BGP speaker shall run the external 1790 update process of 9.2.2. 1792 9.1.4 Overlapping Routes 1794 A BGP speaker may transmit routes with overlapping Network Layer 1795 Reachability Information (NLRI) to another BGP speaker. NLRI overlap 1796 occurs when a set of destinations are identified in non-matching 1797 multiple routes. Since BGP encodes NLRI using IP prefixes, overlap 1798 will always exhibit subset relationships. A route describing a 1799 smaller set of destinations (a longer prefix) is said to be more 1800 specific than a route describing a larger set of destinations (a 1801 shorted prefix); similarly, a route describing a larger set of 1802 destinations (a shorter prefix) is said to be less specific than a 1803 route describing a smaller set of destinations (a longer prefix). 1805 The precedence relationship effectively decomposes less specific 1806 routes into two parts: 1808 - a set of destinations described only by the less specific 1809 route, and 1811 - a set of destinations described by the overlap of the less 1812 specific and the more specific routes 1814 When overlapping routes are present in the same Adj-RIB-In, the more 1815 specific route shall take precedence, in order from more specific to 1816 least specific. 1818 The set of destinations described by the overlap represents a portion 1819 of the less specific route that is feasible, but is not currently in 1820 use. If a more specific route is later withdrawn, the set of 1821 destinations described by the overlap will still be reachable using 1822 the less specific route. 1824 If a BGP speaker receives overlapping routes, the Decision Process 1825 shall take into account the semantics of the overlapping routes. In 1826 particular, if a BGP speaker accepts the less specific route while 1827 rejecting the more specific route from the same peer, then the 1828 destinations represented by the overlap may not forward along the ASs 1829 listed in the AS_PATH attribute of that route. Therefore, a BGP 1830 speaker has the following choices: 1832 a) Install both the less and the more specific routes 1834 b) Install the more specific route only 1836 c) Install the non-overlapping part of the less specific 1837 route only (that implies de-aggregation) 1839 d) Aggregate the two routes and install the aggregated route 1841 e) Install the less specific route only 1843 f) Install neither route 1845 If a BGP speaker chooses e), then it should add ATOMIC_AGGREGATE 1846 attribute to the route. A route that carries ATOMIC_AGGREGATE 1847 attribute can not be de-aggregated. That is, the NLRI of this route 1848 can not be made more specific. Forwarding along such a route does 1849 not guarantee that IP packets will actually traverse only ASs listed 1850 in the AS_PATH attribute of the route. If a BGP speaker chooses a), 1851 it must not advertise the more general route without the more 1852 specific route. 1854 9.2 Update-Send Process 1856 The Update-Send process is responsible for advertising UPDATE 1857 messages to all peers. For example, it distributes the routes chosen 1858 by the Decision Process to other BGP speakers which may be located in 1859 either the same autonomous system or a neighboring autonomous system. 1860 Rules for information exchange between BGP speakers located in 1861 different autonomous systems are given in 9.2.2; rules for 1862 information exchange between BGP speakers located in the same 1863 autonomous system are given in 9.2.1. 1865 Distribution of routing information between a set of BGP speakers, 1866 all of which are located in the same autonomous system, is referred 1867 to as internal distribution. 1869 9.2.1 Internal Updates 1871 The Internal update process is concerned with the distribution of 1872 routing information to BGP speakers located in the local speaker's 1873 autonomous system. 1875 When a BGP speaker receives an UPDATE message from another BGP 1876 speaker located in its own autonomous system, the receiving BGP 1877 speaker shall not re-distribute the routing information contained in 1878 that UPDATE message to other BGP speakers located in its own 1879 autonomous system. 1881 When a BGP speaker receives a new route from a BGP speaker in a 1882 neighboring autonomous system, it shall advertise that route to all 1883 other BGP speakers in its autonomous system by means of an UPDATE 1884 message if any of the following conditions occur: 1886 1) the degree of preference assigned to the newly received route 1887 by the local BGP speaker is higher than the degree of preference 1888 that the local speaker has assigned to other routes that have been 1889 received from BGP speakers in neighboring autonomous systems, or 1891 2) there are no other routes that have been received from BGP 1892 speakers in neighboring autonomous systems, or 1894 3) the newly received route is selected as a result of breaking a 1895 tie between several routes which have the highest degree of 1896 preference, and the same destination (the tie-breaking procedure 1897 is specified in 9.2.1.1). 1899 When a BGP speaker receives an UPDATE message with a non-empty 1900 WITHDRAWN ROUTES field, it shall remove from its Adj-RIB-In all 1901 routes whose destinations was carried in this field (as IP prefixes). 1902 The speaker shall take the following additional steps: 1904 1) if the corresponding feasible route had not been previously 1905 advertised, then no further action is necessary 1907 2) if the corresponding feasible route had been previously 1908 advertised, then: 1910 i) if a new route is selected for advertisement that has the 1911 same Network Layer Reachability Information as the unfeasible 1912 routes, then the local BGP speaker shall advertise the 1913 replacement route 1915 ii) if a replacement route is not available for advertisement, 1916 then the BGP speaker shall include the destinations of the 1917 unfeasible route (in form of IP prefixes) in the WITHDRAWN 1918 ROUTES field of an UPDATE message, and shall send this message 1919 to each peer to whom it had previously advertised the 1920 corresponding feasible route. 1922 All feasible routes which are advertised shall be placed in the 1923 appropriate Adj-RIBs-Out, and all unfeasible routes which are 1924 advertised shall be removed from the Adj-RIBs-Out. 1926 9.2.1.1 Breaking Ties (Internal Updates) 1928 If a local BGP speaker has connections to several BGP speakers in 1929 neighboring autonomous systems, there will be multiple Adj-RIBs-In 1930 associated with these peers. These Adj-RIBs-In might contain several 1931 equally preferable routes to the same destination, all of which were 1932 advertised by BGP speakers located in neighboring autonomous systems. 1933 The local BGP speaker shall select one of these routes according to 1934 the following rules: 1936 a) If the candidate routes differ only in their NEXT_HOP and 1937 MULTI_EXIT_DISC attributes, and the local system is configured to 1938 take into account the MULTI_EXIT_DISC attribute, select the route 1939 that has the lowest value of the MULTI_EXIT_DISC attribute. A 1940 route with the MULTI_EXIT_DISC attribute shall be preferred to a 1941 route without the MULTI_EXIT_DISC attribute. 1943 b) If the local system can ascertain the cost of a path to the 1944 entity depicted by the NEXT_HOP attribute of the candidate route, 1945 select the route with the lowest cost. 1947 c) In all other cases, select the route that was advertised by the 1948 BGP speaker whose BGP Identifier has the lowest value. 1950 9.2.2 External Updates 1952 The external update process is concerned with the distribution of 1953 routing information to BGP speakers located in neighboring autonomous 1954 systems. As part of Phase 3 route selection process, the BGP speaker 1955 has updated its Adj-RIBs-Out and its Forwarding Table. All newly 1956 installed routes and all newly unfeasible routes for which there is 1957 no replacement route shall be advertised to BGP speakers located in 1958 neighboring autonomous systems by means of UPDATE message. 1960 Any routes in the Loc-RIB marked as unfeasible shall be removed. 1961 Changes to the reachable destinations within its own autonomous 1962 system shall also be advertised in an UPDATE message. 1964 9.2.3 Controlling Routing Traffic Overhead 1966 The BGP protocol constrains the amount of routing traffic (that is, 1967 UPDATE messages) in order to limit both the link bandwidth needed to 1968 advertise UPDATE messages and the processing power needed by the 1969 Decision Process to digest the information contained in the UPDATE 1970 messages. 1972 9.2.3.1 Frequency of Route Advertisement 1974 The parameter MinRouteAdvertisementInterval determines the minimum 1975 amount of time that must elapse between advertisement of routes to a 1976 particular destination from a single BGP speaker. This rate limiting 1977 procedure applies on a per-destination basis, although the value of 1978 MinRouteAdvertisementInterval is set on a per BGP peer basis. 1980 Two UPDATE messages sent from a single BGP speaker that advertise 1981 feasible routes to some common set of destinations received from BGP 1982 speakers in neighboring autonomous systems must be separated by at 1983 least MinRouteAdvertisementInterval. Clearly, this can only be 1984 achieved precisely by keeping a separate timer for each common set of 1985 destinations. This would be unwarranted overhead. Any technique which 1986 ensures that the interval between two UPDATE messages sent from a 1987 single BGP speaker that advertise feasible routes to some common set 1988 of destinations received from BGP speakers in neighboring autonomous 1989 systems will be at least MinRouteAdvertisementInterval, and will also 1990 ensure a constant upper bound on the interval is acceptable. 1992 Since fast convergence is needed within an autonomous system, this 1993 procedure does not apply for routes receives from other BGP speakers 1994 in the same autonomous system. To avoid long-lived black holes, the 1995 procedure does not apply to the explicit withdrawal of unfeasible 1996 routes (that is, routes whose destinations (expressed as IP prefixes) 1997 are listed in the WITHDRAWN ROUTES field of an UPDATE message). 1999 This procedure does not limit the rate of route selection, but only 2000 the rate of route advertisement. If new routes are selected multiple 2001 times while awaiting the expiration of MinRouteAdvertisementInterval, 2002 the last route selected shall be advertised at the end of 2003 MinRouteAdvertisementInterval. 2005 9.2.3.2 Frequency of Route Origination 2007 The parameter MinASOriginationInterval determines the minimum amount 2008 of time that must elapse between successive advertisements of UPDATE 2009 messages that report changes within the advertising BGP speaker's own 2010 autonomous systems. 2012 9.2.3.3 Jitter 2014 To minimize the likelihood that the distribution of BGP messages by a 2015 given BGP speaker will contain peaks, jitter should be applied to the 2016 timers associated with MinASOriginationInterval, Keepalive, and 2017 MinRouteAdvertisementInterval. A given BGP speaker shall apply the 2018 same jitter to each of these quantities regardless of the 2019 destinations to which the updates are being sent; that is, jitter 2020 will not be applied on a "per peer" basis. 2022 The amount of jitter to be introduced shall be determined by 2023 multiplying the base value of the appropriate timer by a random 2024 factor which is uniformly distributed in the range from 0.75 to 1.0. 2026 9.2.4 Efficient Organization of Routing Information 2028 Having selected the routing information which it will advertise, a 2029 BGP speaker may avail itself of several methods to organize this 2030 information in an efficient manner. 2032 9.2.4.1 Information Reduction 2034 Information reduction may imply a reduction in granularity of policy 2035 control - after information is collapsed, the same policies will 2036 apply to all destinations and paths in the equivalence class. 2038 The Decision Process may optionally reduce the amount of information 2039 that it will place in the Adj-RIBs-Out by any of the following 2040 methods: 2042 a) Network Layer Reachability Information (NLRI): 2044 Destination IP addresses can be represented as IP address 2045 prefixes. In cases where there is a correspondence between the 2046 address structure and the systems under control of an autonomous 2047 system administrator, it will be possible to reduce the size of 2048 the NLRI carried in the UPDATE messages. 2050 b) AS_PATHs: 2052 AS path information can be represented as ordered AS_SEQUENCEs or 2053 unordered AS_SETs. AS_SETs are used in the route aggregation 2054 algorithm described in 9.2.4.2. They reduce the size of the 2055 AS_PATH information by listing each AS number only once, 2056 regardless of how many times it may have appeared in multiple 2057 AS_PATHs that were aggregated. 2059 An AS_SET implies that the destinations listed in the NLRI can be 2060 reached through paths that traverse at least some of the 2061 constituent autonomous systems. AS_SETs provide sufficient 2062 information to avoid routing information looping; however their 2063 use may prune potentially feasible paths, since such paths are no 2064 longer listed individually as in the form of AS_SEQUENCEs. In 2065 practice this is not likely to be a problem, since once an IP 2066 packet arrives at the edge of a group of autonomous systems, the 2067 BGP speaker at that point is likely to have more detailed path 2068 information and can distinguish individual paths to destinations. 2070 9.2.4.2 Aggregating Routing Information 2072 Aggregation is the process of combining the characteristics of 2073 several different routes in such a way that a single route can be 2074 advertised. Aggregation can occur as part of the decision process 2075 to reduce the amount of routing information that will be placed in 2076 the Adj-RIBs-Out. 2078 Aggregation reduces the amount of information that a BGP speaker must 2079 store and exchange with other BGP speakers. Routes can be aggregated 2080 by applying the following procedure separately to path attributes of 2081 like type and to the Network Layer Reachability Information. 2083 Routes that have the following attributes shall not be aggregated 2084 unless the corresponding attributes of each route are identical: 2085 MULTI_EXIT_DISC, NEXT_HOP. 2087 Path attributes that have different type codes can not be aggregated 2088 together. Path of the same type code may be aggregated, according to 2089 the following rules: 2091 ORIGIN attribute: If at least one route among routes that are 2092 aggregated has ORIGIN with the value INCOMPLETE, then the 2093 aggregated route must have the ORIGIN attribute with the value 2094 INCOMPLETE. Otherwise, if at least one route among routes that are 2095 aggregated has ORIGIN with the value EGP, then the aggregated 2096 route must have the origin attribute with the value EGP. In all 2097 other case the value of the ORIGIN attribute of the aggregated 2098 route is INTERNAL. 2100 AS_PATH attribute: If routes to be aggregated have identical 2101 AS_PATH attributes, then the aggregated route has the same AS_PATH 2102 attribute as each individual route. 2104 For the purpose of aggregating AS_PATH attributes we model each AS 2105 within the AS_PATH attribute as a tuple , where 2106 "type" identifies a type of the path segment the AS belongs to 2107 (e.g. AS_SEQUENCE, AS_SET), and "value" is the AS number. If the 2108 routes to be aggregated have different AS_PATH attributes, then 2109 the aggregated AS_PATH attribute shall satisfy all of the 2110 following conditions: 2112 - all tuples of the type AS_SEQUENCE in the aggregated AS_PATH 2113 shall appear in all of the AS_PATH in the initial set of routes 2114 to be aggregated. 2116 - all tuples of the type AS_SET in the aggregated AS_PATH shall 2117 appear in at least one of the AS_PATH in the initial set (they 2118 may appear as either AS_SET or AS_SEQUENCE types). 2120 - for any tuple X of the type AS_SEQUENCE in the aggregated 2121 AS_PATH which precedes tuple Y in the aggregated AS_PATH, X 2122 precedes Y in each AS_PATH in the initial set which contains Y, 2123 regardless of the type of Y. 2125 - No tuple with the same value shall appear more than once in 2126 the aggregated AS_PATH, regardless of the tuple's type. 2128 An implementation may choose any algorithm which conforms to these 2129 rules. At a minimum a conformant implementation shall be able to 2130 perform the following algorithm that meets all of the above 2131 conditions: 2133 - determine the longest leading sequence of tuples (as defined 2134 above) common to all the AS_PATH attributes of the routes to be 2135 aggregated. Make this sequence the leading sequence of the 2136 aggregated AS_PATH attribute. 2138 - set the type of the rest of the tuples from the AS_PATH 2139 attributes of the routes to be aggregated to AS_SET, and append 2140 them to the aggregated AS_PATH attribute. 2142 - if the aggregated AS_PATH has more than one tuple with the 2143 same value (regardless of tuple's type), eliminate all, but one 2144 such tuple by deleting tuples of the type AS_SET from the 2145 aggregated AS_PATH attribute. 2147 Appendix 6, section 6.8 presents another algorithm that satisfies 2148 the conditions and allows for more complex policy configurations. 2150 ATOMIC_AGGREGATE: If at least one of the routes to be aggregated 2151 has ATOMIC_AGGREGATE path attribute, then the aggregated route 2152 shall have this attribute as well. 2154 AGGREGATOR: All AGGREGATOR attributes of all routes to be 2155 aggregated should be ignored. 2157 9.3 Route Selection Criteria 2159 Generally speaking, additional rules for comparing routes among 2160 several alternatives are outside the scope of this document. There 2161 are two exceptions: 2163 - If the local AS appears in the AS path of the new route being 2164 considered, then that new route cannot be viewed as better than 2165 any other route. If such a route were ever used, a routing loop 2166 would result. 2168 - In order to achieve successful distributed operation, only 2169 routes with a likelihood of stability can be chosen. Thus, an AS 2170 must avoid using unstable routes, and it must not make rapid 2171 spontaneous changes to its choice of route. Quantifying the terms 2172 "unstable" and "rapid" in the previous sentence will require 2173 experience, but the principle is clear. 2175 9.4 Originating BGP routes 2177 A BGP speaker may originate BGP routes by injecting routing 2178 information acquired by some other means (e.g. via an IGP) into BGP. 2179 A BGP speaker that originates BGP routes shall assign the degree of 2180 preference to these routes by passing them through the Decision 2181 Process (see Section 9.1). These routes may also be distributed to 2182 other BGP speakers within the local AS as part of the Internal update 2183 process (see Section 9.2.1). The decision whether to distribute non- 2184 BGP acquired routes within an AS via BGP or not depends on the 2185 environment within the AS (e.g. type of IGP) and should be controlled 2186 via configuration. 2188 Appendix 1. BGP FSM State Transitions and Actions. 2190 This Appendix discusses the transitions between states in the BGP FSM 2191 in response to BGP events. The following is the list of these states 2192 and events when the negotiated Hold Time value is non-zero. 2194 BGP States: 2196 1 - Idle 2197 2 - Connect 2198 3 - Active 2199 4 - OpenSent 2200 5 - OpenConfirm 2201 6 - Established 2203 BGP Events: 2205 1 - BGP Start 2206 2 - BGP Stop 2207 3 - BGP Transport connection open 2208 4 - BGP Transport connection closed 2209 5 - BGP Transport connection open failed 2210 6 - BGP Transport fatal error 2211 7 - ConnectRetry timer expired 2212 8 - Hold Timer expired 2213 9 - KeepAlive timer expired 2214 10 - Receive OPEN message 2215 11 - Receive KEEPALIVE message 2216 12 - Receive UPDATE messages 2217 13 - Receive NOTIFICATION message 2219 The following table describes the state transitions of the BGP FSM 2220 and the actions triggered by these transitions. 2222 Event Actions Message Sent Next State 2223 -------------------------------------------------------------------- 2224 Idle (1) 2225 1 Initialize resources none 2 2226 Start ConnectRetry timer 2227 Initiate a transport connection 2228 others none none 1 2230 Connect(2) 2231 1 none none 2 2232 3 Complete initialization OPEN 4 2233 Clear ConnectRetry timer 2234 5 Restart ConnectRetry timer none 3 2235 7 Restart ConnectRetry timer none 2 2236 Initiate a transport connection 2237 others Release resources none 1 2239 Active (3) 2240 1 none none 3 2241 3 Complete initialization OPEN 4 2242 Clear ConnectRetry timer 2243 5 Close connection 3 2244 Restart ConnectRetry timer 2245 7 Restart ConnectRetry timer none 2 2246 Initiate a transport connection 2247 others Release resources none 1 2249 OpenSent(4) 2250 1 none none 4 2251 4 Close transport connection none 3 2252 Restart ConnectRetry timer 2253 6 Release resources none 1 2254 10 Process OPEN is OK KEEPALIVE 5 2255 Process OPEN failed NOTIFICATION 1 2256 others Close transport connection NOTIFICATION 1 2257 Release resources 2259 OpenConfirm (5) 2260 1 none none 5 2261 4 Release resources none 1 2262 6 Release resources none 1 2263 9 Restart KeepAlive timer KEEPALIVE 5 2264 11 Complete initialization none 6 2265 Restart Hold Timer 2266 13 Close transport connection 1 2267 Release resources 2268 others Close transport connection NOTIFICATION 1 2269 Release resources 2271 Established (6) 2272 1 none none 6 2273 4 Release resources none 1 2274 6 Release resources none 1 2275 9 Restart KeepAlive timer KEEPALIVE 6 2276 11 Restart Hold Timer KEEPALIVE 6 2277 12 Process UPDATE is OK UPDATE 6 2278 Process UPDATE failed NOTIFICATION 1 2279 13 Close transport connection 1 2280 Release resources 2281 others Close transport connection NOTIFICATION 1 2282 Release resources 2283 --------------------------------------------------------------------- 2285 The following is a condensed version of the above state transition 2286 table. 2288 Events| Idle | Connect | Active | OpenSent | OpenConfirm | Estab 2289 | (1) | (2) | (3) | (4) | (5) | (6) 2290 |--------------------------------------------------------------- 2291 1 | 2 | 2 | 3 | 4 | 5 | 6 2292 | | | | | | 2293 2 | 1 | 1 | 1 | 1 | 1 | 1 2294 | | | | | | 2295 3 | 1 | 4 | 4 | 1 | 1 | 1 2296 | | | | | | 2297 4 | 1 | 1 | 1 | 3 | 1 | 1 2298 | | | | | | 2299 5 | 1 | 3 | 3 | 1 | 1 | 1 2300 | | | | | | 2301 6 | 1 | 1 | 1 | 1 | 1 | 1 2302 | | | | | | 2303 7 | 1 | 2 | 2 | 1 | 1 | 1 2304 | | | | | | 2305 8 | 1 | 1 | 1 | 1 | 1 | 1 2306 | | | | | | 2307 9 | 1 | 1 | 1 | 1 | 5 | 6 2308 | | | | | | 2309 10 | 1 | 1 | 1 | 1 or 5 | 1 | 1 2310 | | | | | | 2311 11 | 1 | 1 | 1 | 1 | 6 | 6 2312 | | | | | | 2313 12 | 1 | 1 | 1 | 1 | 1 | 1 or 6 2314 | | | | | | 2315 13 | 1 | 1 | 1 | 1 | 1 | 1 2316 | | | | | | 2317 --------------------------------------------------------------- 2319 Appendix 2. Comparison with RFC1267 2321 BGP-4 is capable of operating in an environment where a set of 2322 reachable destinations may be expressed via a single IP prefix. The 2323 concept of network classes, or subnetting is foreign to BGP-4. To 2324 accommodate these capabilities BGP-4 changes semantics and encoding 2325 associated with the AS_PATH attribute. New text has been added to 2326 define semantics associated with IP prefixes. These abilities allow 2327 BGP-4 to support the proposed supernetting scheme [9]. 2329 To simplify configuration this version introduces a new attribute, 2330 LOCAL_PREF, that facilitates route selection procedures. 2332 The INTER_AS_METRIC attribute has been renamed to be MULTI_EXIT_DISC. 2333 A new attribute, ATOMIC_AGGREGATE, has been introduced to insure that 2334 certain aggregates are not de-aggregated. Another new attribute, 2335 AGGREGATOR, can be added to aggregate routes in order to advertise 2336 which AS and which BGP speaker within that AS caused the aggregation. 2338 To insure that Hold Timers are symmetric, the Hold Time is now 2339 negotiated on a per-connection basis. Hold Times of zero are now 2340 supported. 2342 Appendix 3. Comparison with RFC 1163 2344 All of the changes listed in Appendix 2, plus the following. 2346 To detect and recover from BGP connection collision, a new field (BGP 2347 Identifier) has been added to the OPEN message. New text (Section 2348 6.8) has been added to specify the procedure for detecting and 2349 recovering from collision. 2351 The new document no longer restricts the border router that is passed 2352 in the NEXT_HOP path attribute to be part of the same Autonomous 2353 System as the BGP Speaker. 2355 New document optimizes and simplifies the exchange of the information 2356 about previously reachable routes. 2358 Appendix 4. Comparison with RFC 1105 2360 All of the changes listed in Appendices 2 and 3, plus the following. 2362 Minor changes to the RFC1105 Finite State Machine were necessary to 2363 accommodate the TCP user interface provided by 4.3 BSD. 2365 The notion of Up/Down/Horizontal relations present in RFC1105 has 2366 been removed from the protocol. 2368 The changes in the message format from RFC1105 are as follows: 2370 1. The Hold Time field has been removed from the BGP header and 2371 added to the OPEN message. 2373 2. The version field has been removed from the BGP header and 2374 added to the OPEN message. 2376 3. The Link Type field has been removed from the OPEN message. 2378 4. The OPEN CONFIRM message has been eliminated and replaced with 2379 implicit confirmation provided by the KEEPALIVE message. 2381 5. The format of the UPDATE message has been changed 2382 significantly. New fields were added to the UPDATE message to 2383 support multiple path attributes. 2385 6. The Marker field has been expanded and its role broadened to 2386 support authentication. 2388 Note that quite often BGP, as specified in RFC 1105, is referred 2389 to as BGP-1, BGP, as specified in RFC 1163, is referred to as 2390 BGP-2, BGP, as specified in RFC1267 is referred to as BGP-3, and 2391 BGP, as specified in this document is referred to as BGP-4. 2393 Appendix 5. TCP options that may be used with BGP 2395 If a local system TCP user interface supports TCP PUSH function, then 2396 each BGP message should be transmitted with PUSH flag set. Setting 2397 PUSH flag forces BGP messages to be transmitted promptly to the 2398 receiver. 2400 If a local system TCP user interface supports setting precedence for 2401 TCP connection, then the BGP transport connection should be opened 2402 with precedence set to Internetwork Control (110) value (see also 2403 [6]). 2405 Appendix 6. Implementation Recommendations 2407 This section presents some implementation recommendations. 2409 6.1 Multiple Networks Per Message 2411 The BGP protocol allows for multiple address prefixes with the same 2412 AS path and next-hop gateway to be specified in one message. Making 2413 use of this capability is highly recommended. With one address prefix 2414 per message there is a substantial increase in overhead in the 2415 receiver. Not only does the system overhead increase due to the 2416 reception of multiple messages, but the overhead of scanning the 2417 routing table for updates to BGP peers and other routing protocols 2418 (and sending the associated messages) is incurred multiple times as 2419 well. One method of building messages containing many address 2420 prefixes per AS path and gateway from a routing table that is not 2421 organized per AS path is to build many messages as the routing table 2422 is scanned. As each address prefix is processed, a message for the 2423 associated AS path and gateway is allocated, if it does not exist, 2424 and the new address prefix is added to it. If such a message exists, 2425 the new address prefix is just appended to it. If the message lacks 2426 the space to hold the new address prefix, it is transmitted, a new 2427 message is allocated, and the new address prefix is inserted into the 2428 new message. When the entire routing table has been scanned, all 2429 allocated messages are sent and their resources released. Maximum 2430 compression is achieved when all the destinations covered by the 2431 address prefixes share a gateway and common path attributes, making 2432 it possible to send many address prefixes in one 4096-byte message. 2434 When peering with a BGP implementation that does not compress 2435 multiple address prefixes into one message, it may be necessary to 2436 take steps to reduce the overhead from the flood of data received 2437 when a peer is acquired or a significant network topology change 2438 occurs. One method of doing this is to limit the rate of updates. 2439 This will eliminate the redundant scanning of the routing table to 2440 provide flash updates for BGP peers and other routing protocols. A 2441 disadvantage of this approach is that it increases the propagation 2442 latency of routing information. By choosing a minimum flash update 2443 interval that is not much greater than the time it takes to process 2444 the multiple messages this latency should be minimized. A better 2445 method would be to read all received messages before sending updates. 2447 6.2 Processing Messages on a Stream Protocol 2449 BGP uses TCP as a transport mechanism. Due to the stream nature of 2450 TCP, all the data for received messages does not necessarily arrive 2451 at the same time. This can make it difficult to process the data as 2452 messages, especially on systems such as BSD Unix where it is not 2453 possible to determine how much data has been received but not yet 2454 processed. 2456 One method that can be used in this situation is to first try to read 2457 just the message header. For the KEEPALIVE message type, this is a 2458 complete message; for other message types, the header should first be 2459 verified, in particular the total length. If all checks are 2460 successful, the specified length, minus the size of the message 2461 header is the amount of data left to read. An implementation that 2462 would "hang" the routing information process while trying to read 2463 from a peer could set up a message buffer (4096 bytes) per peer and 2464 fill it with data as available until a complete message has been 2465 received. 2467 6.3 Reducing route flapping 2469 To avoid excessive route flapping a BGP speaker which needs to 2470 withdraw a destination and send an update about a more specific or 2471 less specific route shall combine them into the same UPDATE message. 2473 6.4 BGP Timers 2475 BGP employs five timers: ConnectRetry, Hold Time, KeepAlive, 2476 MinASOriginationInterval, and MinRouteAdvertisementInterval The 2477 suggested value for the ConnectRetry timer is 120 seconds. The 2478 suggested value for the Hold Time is 90 seconds. The suggested value 2479 for the KeepAlive timer is 30 seconds. The suggested value for the 2480 MinASOriginationInterval is 15 seconds. The suggested value for the 2481 MinRouteAdvertisementInterval is 30 seconds. 2483 An implementation of BGP MUST allow these timers to be configurable. 2485 6.5 Path attribute ordering 2487 Implementations which combine update messages as described above in 2488 6.1 may prefer to see all path attributes presented in a known order. 2489 This permits them to quickly identify sets of attributes from 2490 different update messages which are semantically identical. To 2491 facilitate this, it is a useful optimization to order the path 2492 attributes according to type code. This optimization is entirely 2493 optional. 2495 6.6 AS_SET sorting 2497 Another useful optimization that can be done to simplify this 2498 situation is to sort the AS numbers found in an AS_SET. This 2499 optimization is entirely optional. 2501 6.7 Control over version negotiation 2503 Since BGP-4 is capable of carrying aggregated routes which cannot be 2504 properly represented in BGP-3, an implementation which supports BGP-4 2505 and another BGP version should provide the capability to only speak 2506 BGP-4 on a per-peer basis. 2508 6.8 Complex AS_PATH aggregation 2510 An implementation which chooses to provide a path aggregation 2511 algorithm which retains significant amounts of path information may 2512 wish to use the following procedure: 2514 For the purpose of aggregating AS_PATH attributes of two routes, 2515 we model each AS as a tuple , where "type" identifies 2516 a type of the path segment the AS belongs to (e.g. AS_SEQUENCE, 2517 AS_SET), and "value" is the AS number. Two ASs are said to be the 2518 same if their corresponding tuples are the same. 2520 The algorithm to aggregate two AS_PATH attributes works as 2521 follows: 2523 a) Identify the same ASs (as defined above) within each AS_PATH 2524 attribute that are in the same relative order within both 2525 AS_PATH attributes. Two ASs, X and Y, are said to be in the 2526 same order if either: 2527 - X precedes Y in both AS_PATH attributes, or - Y precedes X 2528 in both AS_PATH attributes. 2530 b) The aggregated AS_PATH attribute consists of ASs identified 2531 in (a) in exactly the same order as they appear in the AS_PATH 2532 attributes to be aggregated. If two consecutive ASs identified 2533 in (a) do not immediately follow each other in both of the 2534 AS_PATH attributes to be aggregated, then the intervening ASs 2535 (ASs that are between the two consecutive ASs that are the 2536 same) in both attributes are combined into an AS_SET path 2537 segment that consists of the intervening ASs from both AS_PATH 2538 attributes; this segment is then placed in between the two 2539 consecutive ASs identified in (a) of the aggregated attribute. 2540 If two consecutive ASs identified in (a) immediately follow 2541 each other in one attribute, but do not follow in another, then 2542 the intervening ASs of the latter are combined into an AS_SET 2543 path segment; this segment is then placed in between the two 2544 consecutive ASs identified in (a) of the aggregated attribute. 2546 If as a result of the above procedure a given AS number appears 2547 more than once within the aggregated AS_PATH attribute, all, but 2548 the last instance (rightmost occurrence) of that AS number should 2549 be removed from the aggregated AS_PATH attribute. 2551 References 2553 [1] Mills, D., "Exterior Gateway Protocol Formal Specification", RFC 2554 904, BBN, April 1984. 2556 [2] Rekhter, Y., "EGP and Policy Based Routing in the New NSFNET 2557 Backbone", RFC 1092, T.J. Watson Research Center, February 1989. 2559 [3] Braun, H-W., "The NSFNET Routing Architecture", RFC 1093, 2560 MERIT/NSFNET Project, February 1989. 2562 [4] Postel, J., "Transmission Control Protocol - DARPA Internet 2563 Program Protocol Specification", RFC 793, DARPA, September 1981. 2565 [5] Rekhter, Y., and P. Gross, "Application of the Border Gateway 2566 Protocol in the Internet", T.J. Watson Research Center, IBM Corp., 2567 MCI, Internet Draft. 2569 [6] Postel, J., "Internet Protocol - DARPA Internet Program Protocol 2570 Specification", RFC 791, DARPA, September 1981. 2572 [7] "Information Processing Systems - Telecommunications and 2573 Information Exchange between Systems - Protocol for Exchange of 2574 Inter-domain Routeing Information among Intermediate Systems to 2575 Support Forwarding of ISO 8473 PDUs", ISO/IEC IS10747, 1993 2577 [8] Fuller, V., Li, T., Yu, J., and Varadhan, K., ""Classless Inter- 2578 Domain Routing (CIDR): an Address Assignment and Aggregation 2579 Strategy", RFC 1519, BARRNet, cisco, MERIT, OARnet, September 1993 2581 [9] Rekhter, Y., Li, T., "An Architecture for IP Address Allocation 2582 with CIDR", RFC 1518, T.J. Watson Research Center, cisco, September 2583 1993 2584 Security Considerations 2586 Security issues are not discussed in this document. 2588 Editors' Addresses 2590 Yakov Rekhter 2591 cisco Systems, Inc. 2592 170 W. Tasman Dr. 2593 San Jose, CA 95134 2594 email: yakov@cisco.com 2596 Tony Li 2597 Juniper Networks, Inc. 2598 3260 Jay St. 2599 Santa Clara, CA 95051 2600 (408) 327-1906 2601 email: tli@jnx.com