idnits 2.17.1 draft-ietf-idr-bgp4-26.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 100 longer pages, the longest (page 53) being 79 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 100 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 101 instances of too long lines in the document, the longest one being 10 characters in excess of 72. ** The abstract seems to contain references ([RFC1518,RFC1519]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == The 'Obsoletes: ' line in the draft header should list only the _numbers_ of the RFCs which will be obsoleted by this document (if approved); it should not include the word 'RFC' in the list. -- The abstract seems to indicate that this document obsoletes RFC1771, but the header doesn't have an 'Obsoletes:' line to match this. -- The abstract seems to indicate that this document obsoletes RFC1519, but the header doesn't have an 'Obsoletes:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 407 has weird spacing: '...setting any B...' == Line 2224 has weird spacing: '... system autom...' == Line 3167 has weird spacing: '...rom the under...' == Line 4360 has weird spacing: '...hen all the d...' == Line 4507 has weird spacing: '...y, each key a...' -- The exact meaning of the all-uppercase expression 'MAY NOT' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: BGP implementations MUST recognize all well-known attributes. Some of these attributes are mandatory and MUST be included in every UPDATE message that contains NLRI. Others are discretionary and MAY or MAY NOT be sent in a particular UPDATE message. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: Optional Attribute Status: 1) The DelayOpen optional attribute SHOULD be set to FALSE. 2) The DelayOpenTimer SHOULD not be running. == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: If the route is learned from an external peer, then the local BGP speaker computes the degree of preference based on preconfigured policy information. If the return value indicates that the route is ineligible, the route MAY NOT serve as an input to the next phase of route selection; otherwise the return value MUST be used as the LOCAL_PREF value in any IBGP readvertisement. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: If due to the limits on the maximum size of an UPDATE message (see Section 4) a single route doesn't fit into the message, the BGP speaker MUST not advertise the route to its peers and MAY choose to log an error locally. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RF3065' is mentioned on line 1404, but not defined -- Looks like a reference, but probably isn't: '9' on line 4246 == Missing Reference: 'Deprecated' is mentioned on line 4623, but not defined == Unused Reference: 'RFC1772' is defined on line 4697, but no explicit reference was found in the text == Unused Reference: 'RFC1930' is defined on line 4707, but no explicit reference was found in the text == Unused Reference: 'RFC1997' is defined on line 4711, but no explicit reference was found in the text == Unused Reference: 'RFC2858' is defined on line 4723, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2385 (Obsoleted by RFC 5925) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) -- Obsolete informational reference (is this intentional?): RFC 1519 (Obsoleted by RFC 4632) -- Obsolete informational reference (is this intentional?): RFC 2796 (Obsoleted by RFC 4456) -- Obsolete informational reference (is this intentional?): RFC 2842 (ref. 'RFC3392') (Obsoleted by RFC 3392) -- Obsolete informational reference (is this intentional?): RFC 2858 (Obsoleted by RFC 4760) -- Obsolete informational reference (is this intentional?): RFC 3065 (Obsoleted by RFC 5065) Summary: 8 errors (**), 0 flaws (~~), 21 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Y. Rekhter 2 INTERNET DRAFT T.Li 3 Obsoletes: RFC1771 S. Hares 4 Editors 6 A Border Gateway Protocol 4 (BGP-4) 7 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as ``work in progress.'' 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 Copyright Notice 32 Copyright (C) The Internet Society (2003). All Rights Reserved. 34 Abstract 36 The Border Gateway Protocol (BGP) is an inter-Autonomous System 37 routing protocol. 39 The primary function of a BGP speaking system is to exchange network 40 reachability information with other BGP systems. This network 41 reachability information includes information on the list of 43 RFC DRAFT October 2004 45 Autonomous Systems (ASs) that reachability information traverses. 46 This information is sufficient to construct a graph of AS 47 connectivity for this reachability from which routing loops may be 48 pruned and some policy decisions at the AS level may be enforced. 50 BGP-4 provides a set of mechanisms for supporting Classless Inter- 51 Domain Routing (CIDR) [RFC1518, RFC1519]. These mechanisms include 52 support for advertising a set of destinations as an IP prefix, and 53 eliminating the concept of network "class" within BGP. BGP-4 also 54 introduces mechanisms which allow aggregation of routes, including 55 aggregation of AS paths. 57 Routing information exchanged via BGP supports only the destination- 58 based forwarding paradigm, which assumes that a router forwards a 59 packet based solely on the destination address carried in the IP 60 header of the packet. This, in turn, reflects the set of policy 61 decisions that can (and can not) be enforced using BGP. BGP can 62 support only the policies conforming to the destination-based 63 forwarding paradigm. 65 This specification covers only the exchange of IP version 4 network 66 reachability information. 68 This document obsoletes RFC1771. 70 RFC DRAFT October 2004 72 Table of Contents 74 1. Definition of commonly used terms . . . . . . . . . . . . . . 5 75 2. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 7 76 Specification of Requirements . . . . . . . . . . . . . . . . . . 8 77 3. Summary of Operation . . . . . . . . . . . . . . . . . . . . . 8 78 3.1 Routes: Advertisement and Storage . . . . . . . . . . . . . . 9 79 3.2 Routing Information Bases . . . . . . . . . . . . . . . . . . 10 80 4. Message Formats . . . . . . . . . . . . . . . . . . . . . . . 12 81 4.1 Message Header Format . . . . . . . . . . . . . . . . . . . . 12 82 4.2 OPEN Message Format . . . . . . . . . . . . . . . . . . . . . 13 83 4.3 UPDATE Message Format . . . . . . . . . . . . . . . . . . . . 15 84 4.4 KEEPALIVE Message Format . . . . . . . . . . . . . . . . . . 22 85 4.5 NOTIFICATION Message Format . . . . . . . . . . . . . . . . . 22 86 5. Path Attributes . . . . . . . . . . . . . . . . . . . . . . . 24 87 5.1 Path Attribute Usage . . . . . . . . . . . . . . . . . . . . 26 88 5.1.1 ORIGIN . . . . . . . . . . . . . . . . . . . . . . . . . . 26 89 5.1.2 AS_PATH . . . . . . . . . . . . . . . . . . . . . . . . . . 26 90 5.1.3 NEXT_HOP . . . . . . . . . . . . . . . . . . . . . . . . . 27 91 5.1.4 MULTI_EXIT_DISC . . . . . . . . . . . . . . . . . . . . . . 29 92 5.1.5 LOCAL_PREF . . . . . . . . . . . . . . . . . . . . . . . . 30 93 5.1.6 ATOMIC_AGGREGATE . . . . . . . . . . . . . . . . . . . . . 30 94 5.1.7 AGGREGATOR . . . . . . . . . . . . . . . . . . . . . . . . 31 95 6. BGP Error Handling . . . . . . . . . . . . . . . . . . . . . . 31 96 6.1 Message Header error handling . . . . . . . . . . . . . . . . 31 97 6.2 OPEN message error handling . . . . . . . . . . . . . . . . . 32 98 6.3 UPDATE message error handling . . . . . . . . . . . . . . . . 33 99 6.4 NOTIFICATION message error handling . . . . . . . . . . . . . 35 100 6.5 Hold Timer Expired error handling . . . . . . . . . . . . . . 35 101 6.6 Finite State Machine error handling . . . . . . . . . . . . . 35 102 6.7 Cease . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 103 6.8 BGP connection collision detection . . . . . . . . . . . . . 36 104 7. BGP Version Negotiation . . . . . . . . . . . . . . . . . . . 37 105 8. BGP Finite State machine . . . . . . . . . . . . . . . . . . . 38 106 8.1 Events for the BGP FSM . . . . . . . . . . . . . . . . . . . 39 107 8.1.1 Optional Events linked to Optional Session attributes 108 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 109 8.1.2 Administrative Events . . . . . . . . . . . . . . . . . . 44 110 8.1.3 Timer Events . . . . . . . . . . . . . . . . . . . . . . . 47 111 8.1.4 TCP connection based Events . . . . . . . . . . . . . . . . 49 112 8.1.5 BGP Messages based Events . . . . . . . . . . . . . . . . . 51 113 8.2 Description of FSM . . . . . . . . . . . . . . . . . . . . . 53 114 8.2.1 FSM Definition . . . . . . . . . . . . . . . . . . . . . . 53 115 8.2.1.1 Terms "active" and "passive" . . . . . . . . . . . . . . 54 116 8.2.1.2 FSM and collision detection . . . . . . . . . . . . . . . 54 117 8.2.1.3 FSM and Optional Attributes . . . . . . . . . . . . . . 55 118 8.2.1.4 FSM Event numbers . . . . . . . . . . . . . . . . . . . . 55 120 RFC DRAFT October 2004 122 8.2.1.5 FSM actions that are implementation dependent . . . . . . 56 123 8.2.2 Finite State Machine . . . . . . . . . . . . . . . . . . . 56 124 9. UPDATE Message Handling . . . . . . . . . . . . . . . . . . . 72 125 9.1 Decision Process . . . . . . . . . . . . . . . . . . . . . . 73 126 9.1.1 Phase 1: Calculation of Degree of Preference . . . . . . . 74 127 9.1.2 Phase 2: Route Selection . . . . . . . . . . . . . . . . . 74 128 9.1.2.1 Route Resolvability Condition . . . . . . . . . . . . . . 76 129 9.1.2.2 Breaking Ties (Phase 2) . . . . . . . . . . . . . . . . . 77 130 9.1.3 Phase 3: Route Dissemination . . . . . . . . . . . . . . . 79 131 9.1.4 Overlapping Routes . . . . . . . . . . . . . . . . . . . . 80 132 9.2 Update-Send Process . . . . . . . . . . . . . . . . . . . . . 81 133 9.2.1 Controlling Routing Traffic Overhead . . . . . . . . . . . 82 134 9.2.1.1 Frequency of Route Advertisement . . . . . . . . . . . . 82 135 9.2.1.2 Frequency of Route Origination . . . . . . . . . . . . . 83 136 9.2.2 Efficient Organization of Routing Information . . . . . . . 83 137 9.2.2.1 Information Reduction . . . . . . . . . . . . . . . . . . 83 138 9.2.2.2 Aggregating Routing Information . . . . . . . . . . . . . 84 139 9.3 Route Selection Criteria . . . . . . . . . . . . . . . . . . 86 140 9.4 Originating BGP routes . . . . . . . . . . . . . . . . . . . 87 141 10. BGP Timers . . . . . . . . . . . . . . . . . . . . . . . . . 87 142 Appendix A. Comparison with RFC1771 . . . . . . . . . . . . . . . 88 143 Appendix B. Comparison with RFC1267 . . . . . . . . . . . . . . . 89 144 Appendix C. Comparison with RFC 1163 . . . . . . . . . . . . . . 90 145 Appendix D. Comparison with RFC 1105 . . . . . . . . . . . . . . 90 146 Appendix E. TCP options that may be used with BGP . . . . . . . . 91 147 Appendix F. Implementation Recommendations . . . . . . . . . . . 91 148 Appendix F.1 Multiple Networks Per Message . . . . . . . . . . . 91 149 Appendix F.2 Reducing route flapping . . . . . . . . . . . . . . 92 150 Appendix F.3 Path attribute ordering . . . . . . . . . . . . . . 92 151 Appendix F.4 AS_SET sorting . . . . . . . . . . . . . . . . . . . 92 152 Appendix F.5 Control over version negotiation . . . . . . . . . . 93 153 Appendix F.6 Complex AS_PATH aggregation . . . . . . . . . . . . 93 154 Security Considerations . . . . . . . . . . . . . . . . . . . . . 94 155 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . . 95 156 IPR Disclosure Acknowledgement . . . . . . . . . . . . . . . . . 97 157 Copyright Notice . . . . . . . . . . . . . . . . . . . . . . . . 98 158 Normative References . . . . . . . . . . . . . . . . . . . . . . 98 159 Non-normative References . . . . . . . . . . . . . . . . . . . . 99 160 Authors Information . . . . . . . . . . . . . . . . . . . . . . . 100 162 RFC DRAFT October 2004 164 Abstract 166 The Border Gateway Protocol (BGP) is an inter-Autonomous System rout- 167 ing protocol. 169 The primary function of a BGP speaking system is to exchange network 170 reachability information with other BGP systems. This network reacha- 171 bility information includes information on the list of Autonomous 172 Systems (ASs) that reachability information traverses. This informa- 173 tion is sufficient to construct a graph of AS connectivity for this 174 reachability from which routing loops may be pruned and some policy 175 decisions at the AS level may be enforced. 177 BGP-4 provides a set of mechanisms for supporting Classless Inter- 178 Domain Routing (CIDR) [RFC1518, RFC1519]. These mechanisms include 179 support for advertising a set of destinations as an IP prefix and 180 eliminating the concept of network "class" within BGP. BGP-4 also 181 introduces mechanisms which allow aggregation of routes, including 182 aggregation of AS paths. 184 Routing information exchanged via BGP supports only the destination- 185 based forwarding paradigm, which assumes that a router forwards a 186 packet based solely on the destination address carried in the IP 187 header of the packet. This, in turn, reflects the set of policy deci- 188 sions that can (and can not) be enforced using BGP. BGP can support 189 only the policies conforming to the destination-based forwarding par- 190 adigm. 192 1. Definition of commonly used terms 194 This section provides definition for terms that have a specific mean- 195 ing to the BGP protocol and that are used throughout the text. 197 Adj-RIB-In 198 The Adj-RIBs-In contain unprocessed routing information that has 199 been advertised to the local BGP speaker by its peers. 201 Adj-RIB-Out 202 The Adj-RIBs-Out contains the routes for advertisement to specific 203 peers by means of the local speaker's UPDATE messages. 205 Autonomous System (AS) 206 The classic definition of an Autonomous System is a set of routers 207 under a single technical administration, using an interior gateway 208 protocol (IGP) and common metrics to determine how to route pack- 209 ets within the AS, and using an inter-AS routing protocol to 210 determine how to route packets to other ASs. Since this classic 212 RFC DRAFT October 2004 214 definition was developed, it has become common for a single AS to 215 use several IGPs and sometimes several sets of metrics within an 216 AS. The use of the term Autonomous System here stresses the fact 217 that, even when multiple IGPs and metrics are used, the adminis- 218 tration of an AS appears to other ASs to have a single coherent 219 interior routing plan and presents a consistent picture of what 220 destinations are reachable through it. 222 BGP Identifier 223 A 4-octet unsigned integer indicating the BGP Identifier of the 224 sender of BGP messages. A given BGP speaker sets the value of its 225 BGP Identifier to an IP address assigned to that BGP speaker. The 226 value of the BGP Identifier is determined on startup and is the 227 same for every local interface and every BGP peer. 229 BGP speaker 230 A router that implements BGP. 232 EBGP 233 External BGP (BGP connection between external peers). 235 External peer 236 Peer that is in a different Autonomous System than the local sys- 237 tem. 239 Feasible route 240 An advertised route that is available for use by the recipient. 242 IBGP 243 Internal BGP (BGP connection between internal peers). 245 Internal peer 246 Peer that is in the same Autonomous System as the local system. 248 IGP 249 Interior Gateway Protocol - a routing protocol used to exchange 250 routing information among routers within a single Autonomous Sys- 251 tem. 253 Loc-RIB 254 The Loc-RIB contains the routes that have been selected by the 255 local BGP speaker's Decision Process. 257 NLRI 258 Network Layer Reachability Information. 260 Route 261 A unit of information that pairs a set of destinations with the 263 RFC DRAFT October 2004 265 attributes of a path to those destinations. The set of destina- 266 tions are systems whose IP addresses are contained in one IP 267 address prefix carried in the Network Layer Reachability Informa- 268 tion (NLRI) field of an UPDATE message. The path is the informa- 269 tion reported in the path attributes field of the same UPDATE mes- 270 sage. 272 RIB 273 Routing Information Base. 275 Unfeasible route 276 A previously advertised feasible route that is no longer available 277 for use. 279 2. Acknowledgments 281 This document was originally published as RFC 1267 in October 1991, 282 jointly authored by Kirk Lougheed and Yakov Rekhter. 284 We would like to express our thanks to Guy Almes, Len Bosack, and 285 Jeffrey C. Honig for their contributions to the earlier version 286 (BGP-1) of this document. 288 We would like to specially acknowledge numerous contributions by Den- 289 nis Ferguson to the earlier version of this document. 291 We like to explicitly thank Bob Braden for the review of the earlier 292 version (BGP-2) of this document as well as his constructive and 293 valuable comments. 295 We would also like to thank Bob Hinden, Director for Routing of the 296 Internet Engineering Steering Group, and the team of reviewers he 297 assembled to review the earlier version (BGP-2) of this document. 298 This team, consisting of Deborah Estrin, Milo Medin, John Moy, Radia 299 Perlman, Martha Steenstrup, Mike St. Johns, and Paul Tsuchiya, acted 300 with a strong combination of toughness, professionalism, and cour- 301 tesy. 303 Certain sections of the document borrowed heavily from IDRP 304 [IS10747], which is the OSI counterpart of BGP. For this credit 305 should be given to the ANSI X3S3.3 group chaired by Lyman Chapin and 306 to Charles Kunzinger who was the IDRP editor within that group. 308 We would also like to thank Benjamin Abarbanel, Enke Chen, Edward 309 Crabbe, Mike Craren, Vincent Gillet, Eric Gray, Jeffrey Haas, Dimitry 310 Haskin, Stephen Kent, John Krawczyk, David LeRoy, Dan Massey, 311 Jonathan Natale, Dan Pei, Mathew Richardson, John Scudder, John 313 RFC DRAFT October 2004 315 Stewart III, Dave Thaler, Paul Traina, Russ White, Curtis Villamizar, 316 and Alex Zinin for their comments. 318 We would like to specially acknowledge Andrew Lange for his help in 319 preparing the final version of this document. 321 Finally, we would like to thank all the members of the IDR Working 322 Group for their ideas and support they have given to this document. 324 Specification of Requirements 326 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 327 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 328 document are to be interpreted as described in RFC2119 [RFC2119]. 330 3. Summary of Operation 332 The Border Gateway Protocol (BGP) is an inter-Autonomous System rout- 333 ing protocol. It is built on experience gained with EGP as defined in 334 [RFC904] and EGP usage in the NSFNET Backbone as described in 335 [RFC1092] and [RFC1093]. 337 The primary function of a BGP speaking system is to exchange network 338 reachability information with other BGP systems. This network reacha- 339 bility information includes information on the list of Autonomous 340 Systems (ASs) that reachability information traverses. This informa- 341 tion is sufficient to construct a graph of AS connectivity from which 342 routing loops may be pruned and some policy decisions at the AS level 343 may be enforced. 345 In the context of this document we assume that a BGP speaker adver- 346 tises to its peers only those routes that it itself uses (in this 347 context a BGP speaker is said to "use" a BGP route if it is the most 348 preferred BGP route and is used in forwarding). All other cases are 349 outside the scope of this document. 351 In the context of this document the term "IP address" refers to an IP 352 Version 4 address [RFC791]. 354 Routing information exchanged via BGP supports only the destination- 355 based forwarding paradigm, which assumes that a router forwards a 356 packet based solely on the destination address carried in the IP 357 header of the packet. This, in turn, reflects the set of policy deci- 358 sions that can (and can not) be enforced using BGP. Note that some 360 RFC DRAFT October 2004 362 policies can not be supported by the destination-based forwarding 363 paradigm, and thus require techniques such as source routing (aka 364 explicit routing) to be enforced. Such policies can not be enforced 365 using BGP either. For example, BGP does not enable one AS to send 366 traffic to a neighboring AS for forwarding to some destination 367 (reachable through but) beyond that neighboring AS intending that the 368 traffic take a different route to that taken by the traffic originat- 369 ing in the neighboring AS (for that same destination). On the other 370 hand, BGP can support any policy conforming to the destination-based 371 forwarding paradigm. 373 BGP-4 provides a new set of mechanisms for supporting Classless 374 Inter-Domain Routing (CIDR) [RFC1518, RFC1519]. These mechanisms 375 include support for advertising a set of destinations as an IP prefix 376 and eliminating the concept of network "class" within BGP. BGP-4 377 also introduces mechanisms which allow aggregation of routes, includ- 378 ing aggregation of AS paths. 380 This document uses the term `Autonomous System' (AS) throughout. The 381 classic definition of an Autonomous System is a set of routers under 382 a single technical administration, using an interior gateway protocol 383 (IGP) and common metrics to determine how to route packets within the 384 AS, and using an inter-AS routing protocol to determine how to route 385 packets to other ASs. Since this classic definition was developed, it 386 has become common for a single AS to use several IGPs and sometimes 387 several sets of metrics within an AS. The use of the term Autonomous 388 System here stresses the fact that, even when multiple IGPs and met- 389 rics are used, the administration of an AS appears to other ASs to 390 have a single coherent interior routing plan and presents a consis- 391 tent picture of what destinations are reachable through it. 393 BGP uses TCP [RFC793] as its transport protocol. This eliminates the 394 need to implement explicit update fragmentation, retransmission, 395 acknowledgment, and sequencing. BGP listens on TCP port 179. The 396 error notification mechanism used in BGP assumes that TCP supports a 397 "graceful" close, i.e., that all outstanding data will be delivered 398 before the connection is closed. 400 Two systems form a TCP connection between one another. They exchange 401 messages to open and confirm the connection parameters. 403 The initial data flow is the portion of the BGP routing table that is 404 allowed by the export policy, called the Adj-Ribs-Out (see 3.2). 405 Incremental updates are sent as the routing tables change. BGP does 406 not require periodic refresh of the routing table. To allow local 407 policy changes to have the correct effect without resetting any BGP 408 connections, a BGP speaker SHOULD either (a) retain the current ver- 409 sion of the routes advertised to it by all of its peers for the 411 RFC DRAFT October 2004 413 duration of the connection, or (b) make use of the Route Refresh 414 extension [RFC2918]. 416 KEEPALIVE messages may be sent periodically to ensure the liveness of 417 the connection. NOTIFICATION messages are sent in response to errors 418 or special conditions. If a connection encounters an error condition, 419 a NOTIFICATION message is sent and the connection is closed. 421 A peer in a different AS is referred to as an external peer, while a 422 peer in the same AS is referred to as an internal peer. Internal BGP 423 and external BGP are commonly abbreviated IBGP and EBGP. 425 If a particular AS has multiple BGP speakers and is providing transit 426 service for other ASs, then care must be taken to ensure a consistent 427 view of routing within the AS. A consistent view of the interior 428 routes of the AS is provided by the IGP used within the AS. For the 429 purpose of this document, it is assumed that a consistent view of the 430 routes exterior to the AS is provided by having all BGP speakers 431 within the AS maintain IBGP with each other. 433 This document specifies the base behavior of the BGP protocol. This 434 behavior can and is modified by extension specifications. When the 435 protocol is extended the new behavior is fully documented in the 436 extension specifications. 438 3.1 Routes: Advertisement and Storage 440 For the purpose of this protocol, a route is defined as a unit of 441 information that pairs a set of destinations with the attributes of a 442 path to those destinations. The set of destinations are systems whose 443 IP addresses are contained in one IP address prefix carried in the 444 Network Layer Reachability Information (NLRI) field of an UPDATE mes- 445 sage, and the path is the information reported in the path attributes 446 field of the same UPDATE message. 448 Routes are advertised between BGP speakers in UPDATE messages. Mul- 449 tiple routes that have the same path attributes can be advertised in 450 a single UPDATE message by including multiple prefixes in the NLRI 451 field of the UPDATE message. 453 Routes are stored in the Routing Information Bases (RIBs): namely, 454 the Adj-RIBs-In, the Loc-RIB, and the Adj-RIBs-Out, as described in 455 Section 3.2. 457 If a BGP speaker chooses to advertise a previously received route, it 458 MAY add to or modify the path attributes of the route before adver- 459 tising it to a peer. 461 RFC DRAFT October 2004 463 BGP provides mechanisms by which a BGP speaker can inform its peer 464 that a previously advertised route is no longer available for use. 465 There are three methods by which a given BGP speaker can indicate 466 that a route has been withdrawn from service: 468 a) the IP prefix that expresses the destination for a previously 469 advertised route can be advertised in the WITHDRAWN ROUTES field 470 in the UPDATE message, thus marking the associated route as being 471 no longer available for use 473 b) a replacement route with the same NLRI can be advertised, or 475 c) the BGP speaker - BGP speaker connection can be closed, which 476 implicitly removes from service all routes which the pair of 477 speakers had advertised to each other. 479 Changing attribute(s) of a route is accomplished by advertising a 480 replacement route. The replacement route carries new (changed) 481 attributes and has the same address prefix as the original route. 483 3.2 Routing Information Base 485 The Routing Information Base (RIB) within a BGP speaker consists of 486 three distinct parts: 488 a) Adj-RIBs-In: The Adj-RIBs-In store routing information that has 489 been learned from inbound UPDATE messages received from other BGP 490 speakers. Their contents represent routes that are available as an 491 input to the Decision Process. 493 b) Loc-RIB: The Loc-RIB contains the local routing information 494 that the BGP speaker has selected by applying its local policies 495 to the routing information contained in its Adj-RIBs-In. These are 496 the routes that will be used by the local BGP speaker. The next 497 hop for each of these routes MUST be resolvable via the local BGP 498 speaker's Routing Table. 500 c) Adj-RIBs-Out: The Adj-RIBs-Out store the information that the 501 local BGP speaker has selected for advertisement to its peers. The 502 routing information stored in the Adj-RIBs-Out will be carried in 503 the local BGP speaker's UPDATE messages and advertised to its 504 peers. 506 In summary, the Adj-RIBs-In contain unprocessed routing information 507 that has been advertised to the local BGP speaker by its peers; the 508 Loc-RIB contains the routes that have been selected by the local BGP 509 speaker's Decision Process; and the Adj-RIBs-Out organize the routes 511 RFC DRAFT October 2004 513 for advertisement to specific peers by means of the local speaker's 514 UPDATE messages. 516 Although the conceptual model distinguishes between Adj-RIBs-In, Loc- 517 RIB, and Adj-RIBs-Out, this neither implies nor requires that an 518 implementation must maintain three separate copies of the routing 519 information. The choice of implementation (for example, 3 copies of 520 the information vs 1 copy with pointers) is not constrained by the 521 protocol. 523 Routing information that the BGP speaker uses to forward packets (or 524 to construct the forwarding table that is used for packet forwarding) 525 is maintained in the Routing Table. The Routing Table accumulates 526 routes to directly connected networks, static routes, routes learned 527 from the IGP protocols, and routes learned from BGP. Whether or not 528 a specific BGP route should be installed in the Routing Table, and 529 whether a BGP route should override a route to the same destination 530 installed by another source is a local policy decision, not specified 531 in this document. Besides actual packet forwarding, the Routing Table 532 is used for resolution of the next-hop addresses specified in BGP 533 updates (see Section 5.1.3). 535 4. Message Formats 537 This section describes message formats used by BGP. 539 BGP messages are sent over a TCP connection. A message is processed 540 only after it is entirely received. The maximum message size is 4096 541 octets. All implementations are required to support this maximum mes- 542 sage size. The smallest message that may be sent consists of a BGP 543 header without a data portion, or 19 octets. 545 All multi-octet fields are in network byte order. 547 4.1 Message Header Format 549 Each message has a fixed-size header. There may or may not be a data 550 portion following the header, depending on the message type. The lay- 551 out of these fields is shown below: 553 RFC DRAFT October 2004 555 0 1 2 3 556 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 557 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 558 | | 559 + + 560 | | 561 + + 562 | Marker | 563 + + 564 | | 565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566 | Length | Type | 567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 569 Marker: 571 This 16-octet field is included for compatibility; it MUST be 572 set to all ones. 574 Length: 576 This 2-octet unsigned integer indicates the total length of the 577 message, including the header, in octets. Thus, e.g., it allows 578 one to locate in the TCP stream the (Marker field of the) next 579 message. The value of the Length field MUST always be at least 580 19 and no greater than 4096, and MAY be further constrained, 581 depending on the message type. No "padding" of extra data after 582 the message is allowed, so the Length field MUST have the 583 smallest value required given the rest of the message. 585 Type: 587 This 1-octet unsigned integer indicates the type code of the 588 message. This document defines the following type codes: 590 1 - OPEN 591 2 - UPDATE 592 3 - NOTIFICATION 593 4 - KEEPALIVE 595 [RFC2918] defines one more type code. 597 4.2 OPEN Message Format 599 After a TCP connection is established, the first message sent by each 600 side is an OPEN message. If the OPEN message is acceptable, a 602 RFC DRAFT October 2004 604 KEEPALIVE message confirming the OPEN is sent back. 606 In addition to the fixed-size BGP header, the OPEN message contains 607 the following fields: 609 0 1 2 3 610 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 611 +-+-+-+-+-+-+-+-+ 612 | Version | 613 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 614 | My Autonomous System | 615 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 616 | Hold Time | 617 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 618 | BGP Identifier | 619 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 620 | Opt Parm Len | 621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 622 | | 623 | Optional Parameters (variable) | 624 | | 625 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 627 Version: 629 This 1-octet unsigned integer indicates the protocol version 630 number of the message. The current BGP version number is 4. 632 My Autonomous System: 634 This 2-octet unsigned integer indicates the Autonomous System 635 number of the sender. 637 Hold Time: 639 This 2-octet unsigned integer indicates the number of seconds 640 that the sender proposes for the value of the Hold Timer. Upon 641 receipt of an OPEN message, a BGP speaker MUST calculate the 642 value of the Hold Timer by using the smaller of its configured 643 Hold Time and the Hold Time received in the OPEN message. The 644 Hold Time MUST be either zero or at least three seconds. An 645 implementation MAY reject connections on the basis of the Hold 646 Time. The calculated value indicates the maximum number of 647 seconds that may elapse between the receipt of successive 648 KEEPALIVE, and/or UPDATE messages by the sender. 650 BGP Identifier: 652 RFC DRAFT October 2004 654 This 4-octet unsigned integer indicates the BGP Identifier of 655 the sender. A given BGP speaker sets the value of its BGP Iden- 656 tifier to an IP address assigned to that BGP speaker. The 657 value of the BGP Identifier is determined on startup and is the 658 same for every local interface and every BGP peer. 660 Optional Parameters Length: 662 This 1-octet unsigned integer indicates the total length of the 663 Optional Parameters field in octets. If the value of this field 664 is zero, no Optional Parameters are present. 666 Optional Parameters: 668 This field contains a list of optional parameters, where each 669 parameter is encoded as a triplet. 672 0 1 673 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 674 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 675 | Parm. Type | Parm. Length | Parameter Value (variable) 676 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 678 Parameter Type is a one octet field that unambiguously identi- 679 fies individual parameters. Parameter Length is a one octet 680 field that contains the length of the Parameter Value field in 681 octets. Parameter Value is a variable length field that is 682 interpreted according to the value of the Parameter Type field. 684 [RFC3392] defines the Capabilities Optional Parameter. 686 The minimum length of the OPEN message is 29 octets (including mes- 687 sage header). 689 4.3 UPDATE Message Format 691 UPDATE messages are used to transfer routing information between BGP 692 peers. The information in the UPDATE message can be used to construct 693 a graph describing the relationships of the various Autonomous Sys- 694 tems. By applying rules to be discussed, routing information loops 695 and some other anomalies may be detected and removed from inter-AS 696 routing. 698 An UPDATE message is used to advertise feasible routes sharing common 699 path attributes to a peer, or to withdraw multiple unfeasible routes 701 RFC DRAFT October 2004 703 from service (see 3.1). An UPDATE message MAY simultaneously adver- 704 tise a feasible route and withdraw multiple unfeasible routes from 705 service. The UPDATE message always includes the fixed-size BGP 706 header, and also includes the other fields as shown below (note, some 707 of the shown fields may not be present in every UPDATE message): 709 +-----------------------------------------------------+ 710 | Withdrawn Routes Length (2 octets) | 711 +-----------------------------------------------------+ 712 | Withdrawn Routes (variable) | 713 +-----------------------------------------------------+ 714 | Total Path Attribute Length (2 octets) | 715 +-----------------------------------------------------+ 716 | Path Attributes (variable) | 717 +-----------------------------------------------------+ 718 | Network Layer Reachability Information (variable) | 719 +-----------------------------------------------------+ 721 Withdrawn Routes Length: 723 This 2-octets unsigned integer indicates the total length of 724 the Withdrawn Routes field in octets. Its value allows the 725 length of the Network Layer Reachability Information field to 726 be determined as specified below. 728 A value of 0 indicates that no routes are being withdrawn from 729 service, and that the WITHDRAWN ROUTES field is not present in 730 this UPDATE message. 732 Withdrawn Routes: 734 This is a variable length field that contains a list of IP 735 address prefixes for the routes that are being withdrawn from 736 service. Each IP address prefix is encoded as a 2-tuple of the 737 form , whose fields are described below: 739 +---------------------------+ 740 | Length (1 octet) | 741 +---------------------------+ 742 | Prefix (variable) | 743 +---------------------------+ 745 The use and the meaning of these fields are as follows: 747 a) Length: 749 RFC DRAFT October 2004 751 The Length field indicates the length in bits of the IP 752 address prefix. A length of zero indicates a prefix that 753 matches all IP addresses (with prefix, itself, of zero 754 octets). 756 b) Prefix: 758 The Prefix field contains an IP address prefix followed by 759 the minimum number of trailing bits needed to make the end 760 of the field fall on an octet boundary. Note that the value 761 of trailing bits is irrelevant. 763 Total Path Attribute Length: 765 This 2-octet unsigned integer indicates the total length of the 766 Path Attributes field in octets. Its value allows the length of 767 the Network Layer Reachability field to be determined as speci- 768 fied below. 770 A value of 0 indicates that neither the Network Layer Reacha- 771 bility Information field, nor the Path Attribute field is 772 present in this UPDATE message. 774 Path Attributes: 776 A variable length sequence of path attributes is present in 777 every UPDATE message, except for an UPDATE message that carries 778 only the withdrawn routes. Each path attribute is a triple 779 of variable 780 length. 782 Attribute Type is a two-octet field that consists of the 783 Attribute Flags octet followed by the Attribute Type Code 784 octet. 786 0 1 787 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 788 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 789 | Attr. Flags |Attr. Type Code| 790 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 792 The high-order bit (bit 0) of the Attribute Flags octet is the 793 Optional bit. It defines whether the attribute is optional (if 794 set to 1) or well-known (if set to 0). 796 The second high-order bit (bit 1) of the Attribute Flags octet 797 is the Transitive bit. It defines whether an optional attribute 798 is transitive (if set to 1) or non-transitive (if set to 0). 800 RFC DRAFT October 2004 802 For well-known attributes, the Transitive bit MUST be set to 1. 803 (See Section 5 for a discussion of transitive attributes.) 805 The third high-order bit (bit 2) of the Attribute Flags octet 806 is the Partial bit. It defines whether the information con- 807 tained in the optional transitive attribute is partial (if set 808 to 1) or complete (if set to 0). For well-known attributes and 809 for optional non-transitive attributes the Partial bit MUST be 810 set to 0. 812 The fourth high-order bit (bit 3) of the Attribute Flags octet 813 is the Extended Length bit. It defines whether the Attribute 814 Length is one octet (if set to 0) or two octets (if set to 1). 816 The lower-order four bits of the Attribute Flags octet are 817 unused. They MUST be zero when sent and MUST be ignored when 818 received. 820 The Attribute Type Code octet contains the Attribute Type Code. 821 Currently defined Attribute Type Codes are discussed in Section 822 5. 824 If the Extended Length bit of the Attribute Flags octet is set 825 to 0, the third octet of the Path Attribute contains the length 826 of the attribute data in octets. 828 If the Extended Length bit of the Attribute Flags octet is set 829 to 1, then the third and the fourth octets of the path 830 attribute contain the length of the attribute data in octets. 832 The remaining octets of the Path Attribute represent the 833 attribute value and are interpreted according to the Attribute 834 Flags and the Attribute Type Code. The supported Attribute Type 835 Codes, their attribute values and uses are the following: 837 a) ORIGIN (Type Code 1): 839 ORIGIN is a well-known mandatory attribute that defines the 840 origin of the path information. The data octet can assume 841 the following values: 843 Value Meaning 845 0 IGP - Network Layer Reachability Information 846 is interior to the originating AS 848 1 EGP - Network Layer Reachability Information 849 learned via the EGP protocol [RFC904] 851 RFC DRAFT October 2004 853 2 INCOMPLETE - Network Layer Reachability 854 Information learned by some other means 856 Usage of this attribute is defined in 5.1.1. 858 b) AS_PATH (Type Code 2): 860 AS_PATH is a well-known mandatory attribute that is composed 861 of a sequence of AS path segments. Each AS path segment is 862 represented by a triple . 865 The path segment type is a 1-octet long field with the fol- 866 lowing values defined: 868 Value Segment Type 870 1 AS_SET: unordered set of ASs a route in the 871 UPDATE message has traversed 873 2 AS_SEQUENCE: ordered set of ASs a route in 874 the UPDATE message has traversed 876 The path segment length is a 1-octet long field containing 877 the number of ASs (not the number of octets) in the path 878 segment value field. 880 The path segment value field contains one or more AS num- 881 bers, each encoded as a 2-octets long field. 883 Usage of this attribute is defined in 5.1.2. 885 c) NEXT_HOP (Type Code 3): 887 This is a well-known mandatory attribute that defines the 888 (unicast) IP address of the router that SHOULD be used as 889 the next hop to the destinations listed in the Network Layer 890 Reachability Information field of the UPDATE message. 892 Usage of this attribute is defined in 5.1.3. 894 d) MULTI_EXIT_DISC (Type Code 4): 896 This is an optional non-transitive attribute that is a four 897 octet unsigned integer. The value of this attribute MAY be 898 used by a BGP speaker's Decision Process to discriminate 899 among multiple entry points to a neighboring autonomous 901 RFC DRAFT October 2004 903 system. 905 Usage of this attribute is defined in 5.1.4. 907 e) LOCAL_PREF (Type Code 5): 909 LOCAL_PREF is a well-known attribute that is a four octet 910 unsigned integer. A BGP speaker uses it to inform its other 911 internal peers of the advertising speaker's degree of pref- 912 erence for an advertised route. 914 Usage of this attribute is defined in 5.1.5. 916 f) ATOMIC_AGGREGATE (Type Code 6) 918 ATOMIC_AGGREGATE is a well-known discretionary attribute of 919 length 0. 921 Usage of this attribute is defined in 5.1.6. 923 g) AGGREGATOR (Type Code 7) 925 AGGREGATOR is an optional transitive attribute of length 6. 926 The attribute contains the last AS number that formed the 927 aggregate route (encoded as 2 octets), followed by the IP 928 address of the BGP speaker that formed the aggregate route 929 (encoded as 4 octets). This SHOULD be the same address as 930 the one used for the BGP Identifier of the speaker. 932 Usage of this attribute is defined in 5.1.7. 934 Network Layer Reachability Information: 936 This variable length field contains a list of IP address pre- 937 fixes. The length in octets of the Network Layer Reachability 938 Information is not encoded explicitly, but can be calculated 939 as: 941 UPDATE message Length - 23 - Total Path Attributes Length - 942 Withdrawn Routes Length 944 where UPDATE message Length is the value encoded in the fixed- 945 size BGP header, Total Path Attribute Length and Withdrawn 946 Routes Length are the values encoded in the variable part of 947 the UPDATE message, and 23 is a combined length of the fixed- 948 size BGP header, the Total Path Attribute Length field and the 949 Withdrawn Routes Length field. 951 RFC DRAFT October 2004 953 Reachability information is encoded as one or more 2-tuples of 954 the form , whose fields are described below: 956 +---------------------------+ 957 | Length (1 octet) | 958 +---------------------------+ 959 | Prefix (variable) | 960 +---------------------------+ 962 The use and the meaning of these fields are as follows: 964 a) Length: 966 The Length field indicates the length in bits of the IP 967 address prefix. A length of zero indicates a prefix that 968 matches all IP addresses (with prefix, itself, of zero 969 octets). 971 b) Prefix: 973 The Prefix field contains an IP address prefix followed by 974 enough trailing bits to make the end of the field fall on an 975 octet boundary. Note that the value of the trailing bits is 976 irrelevant. 978 The minimum length of the UPDATE message is 23 octets -- 19 octets 979 for the fixed header + 2 octets for the Withdrawn Routes Length + 2 980 octets for the Total Path Attribute Length (the value of Withdrawn 981 Routes Length is 0 and the value of Total Path Attribute Length is 982 0). 984 An UPDATE message can advertise at most one set of path attributes, 985 but multiple destinations, provided that the destinations share these 986 attributes. All path attributes contained in a given UPDATE message 987 apply to all destinations carried in the NLRI field of the UPDATE 988 message. 990 An UPDATE message can list multiple routes to be withdrawn from ser- 991 vice. Each such route is identified by its destination (expressed as 992 an IP prefix), which unambiguously identifies the route in the con- 993 text of the BGP speaker - BGP speaker connection to which it has been 994 previously advertised. 996 An UPDATE message might advertise only routes to be withdrawn from 997 service, in which case it will not include path attributes or Network 998 Layer Reachability Information. Conversely, it may advertise only a 999 feasible route, in which case the WITHDRAWN ROUTES field need not be 1000 present. 1002 RFC DRAFT October 2004 1004 An UPDATE message SHOULD NOT include the same address prefix in the 1005 WITHDRAWN ROUTES and Network Layer Reachability Information fields, 1006 however a BGP speaker MUST be able to process UPDATE messages in this 1007 form. A BGP speaker SHOULD treat an UPDATE message of this form as if 1008 the WITHDRAWN ROUTES doesn't contain the address prefix. 1010 4.4 KEEPALIVE Message Format 1012 BGP does not use any TCP-based keep-alive mechanism to determine if 1013 peers are reachable. Instead, KEEPALIVE messages are exchanged 1014 between peers often enough as not to cause the Hold Timer to expire. 1015 A reasonable maximum time between KEEPALIVE messages would be one 1016 third of the Hold Time interval. KEEPALIVE messages MUST NOT be sent 1017 more frequently than one per second. An implementation MAY adjust the 1018 rate at which it sends KEEPALIVE messages as a function of the Hold 1019 Time interval. 1021 If the negotiated Hold Time interval is zero, then periodic KEEPALIVE 1022 messages MUST NOT be sent. 1024 A KEEPALIVE message consists of only message header and has a length 1025 of 19 octets. 1027 4.5 NOTIFICATION Message Format 1029 A NOTIFICATION message is sent when an error condition is detected. 1030 The BGP connection is closed immediately after sending it. 1032 In addition to the fixed-size BGP header, the NOTIFICATION message 1033 contains the following fields: 1035 0 1 2 3 1036 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1037 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1038 | Error code | Error subcode | Data (variable) | 1039 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1041 Error Code: 1043 This 1-octet unsigned integer indicates the type of NOTIFICA- 1044 TION. The following Error Codes have been defined: 1046 Error Code Symbolic Name Reference 1048 RFC DRAFT October 2004 1050 1 Message Header Error Section 6.1 1052 2 OPEN Message Error Section 6.2 1054 3 UPDATE Message Error Section 6.3 1056 4 Hold Timer Expired Section 6.5 1058 5 Finite State Machine Error Section 6.6 1060 6 Cease Section 6.7 1062 Error subcode: 1064 This 1-octet unsigned integer provides more specific informa- 1065 tion about the nature of the reported error. Each Error Code 1066 may have one or more Error Subcodes associated with it. If no 1067 appropriate Error Subcode is defined, then a zero (Unspecific) 1068 value is used for the Error Subcode field. 1070 Message Header Error subcodes: 1072 1 - Connection Not Synchronized. 1073 2 - Bad Message Length. 1074 3 - Bad Message Type. 1076 OPEN Message Error subcodes: 1078 1 - Unsupported Version Number. 1079 2 - Bad Peer AS. 1080 3 - Bad BGP Identifier. 1081 4 - Unsupported Optional Parameter. 1082 5 - [Deprecated - see Appendix A]. 1083 6 - Unacceptable Hold Time. 1085 UPDATE Message Error subcodes: 1087 1 - Malformed Attribute List. 1088 2 - Unrecognized Well-known Attribute. 1089 3 - Missing Well-known Attribute. 1090 4 - Attribute Flags Error. 1091 5 - Attribute Length Error. 1092 6 - Invalid ORIGIN Attribute. 1093 7 - [Deprecated - see Appendix A]. 1094 8 - Invalid NEXT_HOP Attribute. 1095 9 - Optional Attribute Error. 1096 10 - Invalid Network Field. 1098 RFC DRAFT October 2004 1100 11 - Malformed AS_PATH. 1102 Data: 1104 This variable-length field is used to diagnose the reason for 1105 the NOTIFICATION. The contents of the Data field depend upon 1106 the Error Code and Error Subcode. See Section 6 below for more 1107 details. 1109 Note that the length of the Data field can be determined from 1110 the message Length field by the formula: 1112 Message Length = 21 + Data Length 1114 The minimum length of the NOTIFICATION message is 21 octets (includ- 1115 ing message header). 1117 5. Path Attributes 1119 This section discusses the path attributes of the UPDATE message. 1121 Path attributes fall into four separate categories: 1123 1. Well-known mandatory. 1124 2. Well-known discretionary. 1125 3. Optional transitive. 1126 4. Optional non-transitive. 1128 BGP implementations MUST recognize all well-known attributes. Some 1129 of these attributes are mandatory and MUST be included in every 1130 UPDATE message that contains NLRI. Others are discretionary and MAY 1131 or MAY NOT be sent in a particular UPDATE message. 1133 Once a BGP peer has updated any well-known attributes, it MUST pass 1134 these attributes in any updates it transmits to its peers. 1136 In addition to well-known attributes, each path MAY contain one or 1137 more optional attributes. It is not required or expected that all BGP 1138 implementations support all optional attributes. The handling of an 1139 unrecognized optional attribute is determined by the setting of the 1140 Transitive bit in the attribute flags octet. Paths with unrecognized 1141 transitive optional attributes SHOULD be accepted. If a path with 1142 unrecognized transitive optional attribute is accepted and passed 1143 along to other BGP peers, then the unrecognized transitive optional 1144 attribute of that path MUST be passed along with the path to other 1146 RFC DRAFT October 2004 1148 BGP peers with the Partial bit in the Attribute Flags octet set to 1. 1149 If a path with recognized transitive optional attribute is accepted 1150 and passed along to other BGP peers and the Partial bit in the 1151 Attribute Flags octet is set to 1 by some previous AS, it MUST NOT be 1152 set back to 0 by the current AS. Unrecognized non-transitive optional 1153 attributes MUST be quietly ignored and not passed along to other BGP 1154 peers. 1156 New transitive optional attributes MAY be attached to the path by the 1157 originator or by any other BGP speaker in the path. If they are not 1158 attached by the originator, the Partial bit in the Attribute Flags 1159 octet is set to 1. The rules for attaching new non-transitive 1160 optional attributes will depend on the nature of the specific 1161 attribute. The documentation of each new non-transitive optional 1162 attribute will be expected to include such rules. (The description of 1163 the MULTI_EXIT_DISC attribute gives an example.) All optional 1164 attributes (both transitive and non-transitive) MAY be updated (if 1165 appropriate) by BGP speakers in the path. 1167 The sender of an UPDATE message SHOULD order path attributes within 1168 the UPDATE message in ascending order of attribute type. The receiver 1169 of an UPDATE message MUST be prepared to handle path attributes 1170 within the UPDATE message that are out of order. 1172 The same attribute (attribute with the same type) can not appear more 1173 than once within the Path Attributes field of a particular UPDATE 1174 message. 1176 The mandatory category refers to an attribute which MUST be present 1177 in both IBGP and EBGP exchanges if NLRI are contained in the UPDATE 1178 message. Attributes classified as optional for the purpose of the 1179 protocol extension mechanism may be purely discretionary, or discre- 1180 tionary, required, or disallowed in certain contexts. 1182 attribute EBGP IBGP 1183 ORIGIN mandatory mandatory 1184 AS_PATH mandatory mandatory 1185 NEXT_HOP mandatory mandatory 1186 MULTI_EXIT_DISC discretionary discretionary 1187 LOCAL_PREF see Section 5.1.5 required 1188 ATOMIC_AGGREGATE see Section 5.1.6 and 9.1.4 1189 AGGREGATOR discretionary discretionary 1191 RFC DRAFT October 2004 1193 5.1 Path Attribute Usage 1195 The usage of each BGP path attribute is described in the following 1196 clauses. 1198 5.1.1 ORIGIN 1200 ORIGIN is a well-known mandatory attribute. The ORIGIN attribute is 1201 generated by the speaker that originates the associated routing 1202 information. Its value SHOULD NOT be changed by any other speaker. 1204 5.1.2 AS_PATH 1206 AS_PATH is a well-known mandatory attribute. This attribute identi- 1207 fies the autonomous systems through which routing information carried 1208 in this UPDATE message has passed. The components of this list can be 1209 AS_SETs or AS_SEQUENCEs. 1211 When a BGP speaker propagates a route which it has learned from 1212 another BGP speaker's UPDATE message, it modifies the route's AS_PATH 1213 attribute based on the location of the BGP speaker to which the route 1214 will be sent: 1216 a) When a given BGP speaker advertises the route to an internal 1217 peer, the advertising speaker SHALL NOT modify the AS_PATH 1218 attribute associated with the route. 1220 b) When a given BGP speaker advertises the route to an external 1221 peer, then the advertising speaker updates the AS_PATH attribute 1222 as follows: 1224 1) if the first path segment of the AS_PATH is of type 1225 AS_SEQUENCE, the local system prepends its own AS number as the 1226 last element of the sequence (put it in the leftmost position 1227 with respect to the position of octets in the protocol mes- 1228 sage). If the act of prepending will cause an overflow in the 1229 AS_PATH segment, i.e. more than 255 ASs, it SHOULD prepend a 1230 new segment of type AS_SEQUENCE and prepend its own AS number 1231 to this new segment. 1233 2) if the first path segment of the AS_PATH is of type AS_SET, 1234 the local system prepends a new path segment of type 1236 RFC DRAFT October 2004 1238 AS_SEQUENCE to the AS_PATH, including its own AS number in that 1239 segment. 1241 3) if the AS_PATH is empty, the local system creates a path 1242 segment of type AS_SEQUENCE, places its own AS into that seg- 1243 ment, and places that segment into the AS_PATH. 1245 When a BGP speaker originates a route then: 1247 a) the originating speaker includes its own AS number in a path 1248 segment of type AS_SEQUENCE in the AS_PATH attribute of all UPDATE 1249 messages sent to an external peer. (In this case, the AS number of 1250 the originating speaker's autonomous system will be the only entry 1251 the path segment, and this path segment will be the only segment 1252 in the AS_PATH attribute). 1254 b) the originating speaker includes an empty AS_PATH attribute in 1255 all UPDATE messages sent to internal peers. (An empty AS_PATH 1256 attribute is one whose length field contains the value zero). 1258 Whenever the modification of the AS_PATH attribute calls for includ- 1259 ing or prepending the AS number of the local system, the local system 1260 MAY include/prepend more than one instance of its own AS number in 1261 the AS_PATH attribute. This is controlled via local configuration. 1263 5.1.3 NEXT_HOP 1265 The NEXT_HOP is a well-known mandatory attribute that defines the IP 1266 address of the router that SHOULD be used as the next hop to the des- 1267 tinations listed in the UPDATE message. The NEXT_HOP attribute is 1268 calculated as follows. 1270 1) When sending a message to an internal peer, if the route is not 1271 locally originated the BGP speaker SHOULD NOT modify the NEXT_HOP 1272 attribute, unless it has been explicitly configured to announce 1273 its own IP address as the NEXT_HOP. When announcing a locally 1274 originated route to an internal peer, the BGP speaker SHOULD use 1275 as the NEXT_HOP the interface address of the router through which 1276 the announced network is reachable for the speaker; if the route 1277 is directly connected to the speaker, or the interface address of 1278 the router through which the announced network is reachable for 1279 the speaker is the internal peer's address, then the BGP speaker 1280 SHOULD use for the NEXT_HOP attribute its own IP address (the 1281 address of the interface that is used to reach the peer). 1283 2) When sending a message to an external peer X, and the peer is 1285 RFC DRAFT October 2004 1287 one IP hop away from the speaker: 1289 - If the route being announced was learned from an internal 1290 peer or is locally originated, the BGP speaker can use for the 1291 NEXT_HOP attribute an interface address of the internal peer 1292 router (or the internal router) through which the announced 1293 network is reachable for the speaker, provided that peer X 1294 shares a common subnet with this address. This is a form of 1295 "third party" NEXT_HOP attribute. 1297 - Otherwise, if the route being announced was learned from an 1298 external peer, the speaker can use in the NEXT_HOP attribute an 1299 IP address of any adjacent router (known from the received 1300 NEXT_HOP attribute) that the speaker itself uses for local 1301 route calculation, provided that peer X shares a common subnet 1302 with this address. This is a second form of "third party" 1303 NEXT_HOP attribute. 1305 - Otherwise, if the external peer to which the route is being 1306 advertised shares a common subnet with one of the interfaces of 1307 the announcing BGP speaker, the speaker MAY use the IP address 1308 associated with such an interface in the NEXT_HOP attribute. 1309 This is known as a "first party" NEXT_HOP attribute. 1311 - By default (if none of the above conditions apply), the BGP 1312 speaker SHOULD use in the NEXT_HOP attribute the IP address of 1313 the interface that the speaker uses to establish the BGP con- 1314 nection to peer X. 1316 3) When sending a message to an external peer X, and the peer is 1317 multiple IP hops away from the speaker (aka "multihop EBGP"): 1319 - The speaker MAY be configured to propagate the NEXT_HOP 1320 attribute. In this case when advertising a route that the 1321 speaker learned from one of its peers, the NEXT_HOP attribute 1322 of the advertised route is exactly the same as the NEXT_HOP 1323 attribute of the learned route (the speaker just doesn't modify 1324 the NEXT_HOP attribute). 1326 - By default, the BGP speaker SHOULD use in the NEXT_HOP 1327 attribute the IP address of the interface that the speaker uses 1328 to establish the BGP connection to peer X. 1330 Normally the NEXT_HOP attribute is chosen such that the shortest 1331 available path will be taken. A BGP speaker MUST be able to support 1332 disabling advertisement of third party NEXT_HOP attributes to handle 1333 imperfectly bridged media. 1335 RFC DRAFT October 2004 1337 A route originated by a BGP speaker SHALL NOT be advertised to a peer 1338 using an address of that peer as NEXT_HOP. A BGP speaker SHALL NOT 1339 install a route with itself as the next hop. 1341 The NEXT_HOP attribute is used by the BGP speaker to determine the 1342 actual outbound interface and immediate next-hop address that SHOULD 1343 be used to forward transit packets to the associated destinations. 1345 The immediate next-hop address is determined by performing a recur- 1346 sive route lookup operation for the IP address in the NEXT_HOP 1347 attribute using the contents of the Routing Table, selecting one 1348 entry if multiple entries of equal cost exist. The Routing Table 1349 entry which resolves the IP address in the NEXT_HOP attribute will 1350 always specify the outbound interface. If the entry specifies an 1351 attached subnet, but does not specify a next-hop address, then the 1352 address in the NEXT_HOP attribute SHOULD be used as the immediate 1353 next-hop address. If the entry also specifies the next-hop address, 1354 this address SHOULD be used as the immediate next-hop address for 1355 packet forwarding. 1357 5.1.4 MULTI_EXIT_DISC 1359 The MULTI_EXIT_DISC is an optional non-transitive attribute which is 1360 intended to be used on external (inter-AS) links to discriminate 1361 among multiple exit or entry points to the same neighboring AS. The 1362 value of the MULTI_EXIT_DISC attribute is a four octet unsigned num- 1363 ber which is called a metric. All other factors being equal, the exit 1364 point with lower metric SHOULD be preferred. If received over EBGP, 1365 the MULTI_EXIT_DISC attribute MAY be propagated over IBGP to other 1366 BGP speakers within the same AS (see also 9.1.2.2). The 1367 MULTI_EXIT_DISC attribute received from a neighboring AS MUST NOT be 1368 propagated to other neighboring ASs. 1370 A BGP speaker MUST implement a mechanism based on local configuration 1371 which allows the MULTI_EXIT_DISC attribute to be removed from a 1372 route. If a BGP speaker is configured to remove the MULTI_EXIT_DISC 1373 attribute from a route, then this removal MUST be done prior to 1374 determining the degree of preference of the route and performing 1375 route selection (Decision Process phases 1 and 2). 1377 An implementation MAY also (based on local configuration) alter the 1378 value of the MULTI_EXIT_DISC attribute received over EBGP. If a BGP 1379 speaker is configured to alter the value of the MULTI_EXIT_DISC 1380 attribute received over EBGP, then altering the value MUST be done 1381 prior to determining the degree of preference of the route and per- 1382 forming route selection (Decision Process phases 1 and 2). See 1384 RFC DRAFT October 2004 1386 Section 9.1.2.2 for necessary restrictions on this. 1388 5.1.5 LOCAL_PREF 1390 LOCAL_PREF is a well-known attribute that SHALL be included in all 1391 UPDATE messages that a given BGP speaker sends to the other internal 1392 peers. A BGP speaker SHALL calculate the degree of preference for 1393 each external route based on the locally configured policy, and 1394 include the degree of preference when advertising a route to its 1395 internal peers. The higher degree of preference MUST be preferred. A 1396 BGP speaker uses the degree of preference learned via LOCAL_PREF in 1397 its Decision Process (see Section 9.1.1). 1399 A BGP speaker MUST NOT include this attribute in UPDATE messages that 1400 it sends to external peers, except for the case of BGP Confederations 1401 [RFC3065]. If it is contained in an UPDATE message that is received 1402 from an external peer, then this attribute MUST be ignored by the 1403 receiving speaker, except for the case of BGP Confederations 1404 [RF3065]. 1406 5.1.6 ATOMIC_AGGREGATE 1408 ATOMIC_AGGREGATE is a well-known discretionary attribute. 1410 When a BGP speaker aggregates several routes for the purpose of 1411 advertisement to a particular peer, the AS_PATH of the aggregated 1412 route normally includes an AS_SET formed from the set of ASs from 1413 which the aggregate was formed. In many cases the network adminis- 1414 trator can determine that the aggregate can safely be advertised 1415 without the AS_SET and not form route loops. 1417 If an aggregate excludes at least some of the AS numbers present in 1418 the AS_PATH of the routes that are aggregated as a result of dropping 1419 the AS_SET, the aggregated route, when advertised to the peer, SHOULD 1420 include the ATOMIC_AGGREGATE attribute. 1422 A BGP speaker that receives a route with the ATOMIC_AGGREGATE 1423 attribute SHOULD NOT remove the attribute from the route when propa- 1424 gating it to other speakers. 1426 A BGP speaker that receives a route with the ATOMIC_AGGREGATE 1427 attribute MUST NOT make any NLRI of that route more specific (as 1428 defined in 9.1.4) when advertising this route to other BGP speakers. 1430 RFC DRAFT October 2004 1432 A BGP speaker that receives a route with the ATOMIC_AGGREGATE 1433 attribute needs to be aware of the fact that the actual path to des- 1434 tinations, as specified in the NLRI of the route, while having the 1435 loop-free property, may not be the path specified in the AS_PATH 1436 attribute of the route. 1438 5.1.7 AGGREGATOR 1440 AGGREGATOR is an optional transitive attribute which MAY be included 1441 in updates which are formed by aggregation (see Section 9.2.2.2). A 1442 BGP speaker which performs route aggregation MAY add the AGGREGATOR 1443 attribute which SHALL contain its own AS number and IP address. The 1444 IP address SHOULD be the same as the BGP Identifier of the speaker. 1446 6. BGP Error Handling. 1448 This section describes actions to be taken when errors are detected 1449 while processing BGP messages. 1451 When any of the conditions described here are detected, a NOTIFICA- 1452 TION message with the indicated Error Code, Error Subcode, and Data 1453 fields is sent, and the BGP connection is closed, unless it is 1454 explicitly stated that no NOTIFICATION message is to be sent and the 1455 BGP connection is not to be closed. If no Error Subcode is specified, 1456 then a zero MUST be used. 1458 The phrase "the BGP connection is closed" means that the TCP connec- 1459 tion has been closed, the associated Adj-RIB-In has been cleared, and 1460 that all resources for that BGP connection have been deallocated. 1461 Entries in the Loc-RIB associated with the remote peer are marked as 1462 invalid. The local system recalculates its best routes for the des- 1463 tinations of the routes marked as invalid, and before the invalid 1464 routes are deleted from the system advertises to its peers either 1465 withdraws for the routes marked as invalid, or the new best routes 1466 before the invalid routes are deleted from the system. 1468 Unless specified explicitly, the Data field of the NOTIFICATION mes- 1469 sage that is sent to indicate an error is empty. 1471 6.1 Message Header error handling. 1473 All errors detected while processing the Message Header MUST be 1475 RFC DRAFT October 2004 1477 indicated by sending the NOTIFICATION message with Error Code Message 1478 Header Error. The Error Subcode elaborates on the specific nature of 1479 the error. 1481 The expected value of the Marker field of the message header is all 1482 ones. If the Marker field of the message header is not as expected, 1483 then a synchronization error has occurred and the Error Subcode MUST 1484 be set to Connection Not Synchronized. 1486 If at least one of the following is true: 1488 - if the Length field of the message header is less than 19 or 1489 greater than 4096, or 1491 - if the Length field of an OPEN message is less than the minimum 1492 length of the OPEN message, or 1494 - if the Length field of an UPDATE message is less than the mini- 1495 mum length of the UPDATE message, or 1497 - if the Length field of a KEEPALIVE message is not equal to 19, 1498 or 1500 - if the Length field of a NOTIFICATION message is less than the 1501 minimum length of the NOTIFICATION message, 1503 then the Error Subcode MUST be set to Bad Message Length. The Data 1504 field MUST contain the erroneous Length field. 1506 If the Type field of the message header is not recognized, then the 1507 Error Subcode MUST be set to Bad Message Type. The Data field MUST 1508 contain the erroneous Type field. 1510 6.2 OPEN message error handling. 1512 All errors detected while processing the OPEN message MUST be indi- 1513 cated by sending the NOTIFICATION message with Error Code OPEN Mes- 1514 sage Error. The Error Subcode elaborates on the specific nature of 1515 the error. 1517 If the version number contained in the Version field of the received 1518 OPEN message is not supported, then the Error Subcode MUST be set to 1519 Unsupported Version Number. The Data field is a 2-octets unsigned 1520 integer, which indicates the largest locally supported version number 1521 less than the version the remote BGP peer bid (as indicated in the 1522 received OPEN message), or if the smallest locally supported version 1524 RFC DRAFT October 2004 1526 number is greater than the version the remote BGP peer bid, then the 1527 smallest locally supported version number. 1529 If the Autonomous System field of the OPEN message is unacceptable, 1530 then the Error Subcode MUST be set to Bad Peer AS. The determination 1531 of acceptable Autonomous System numbers is outside the scope of this 1532 protocol. 1534 If the Hold Time field of the OPEN message is unacceptable, then the 1535 Error Subcode MUST be set to Unacceptable Hold Time. An implementa- 1536 tion MUST reject Hold Time values of one or two seconds. An imple- 1537 mentation MAY reject any proposed Hold Time. An implementation which 1538 accepts a Hold Time MUST use the negotiated value for the Hold Time. 1540 If the BGP Identifier field of the OPEN message is syntactically 1541 incorrect, then the Error Subcode MUST be set to Bad BGP Identifier. 1542 Syntactic correctness means that the BGP Identifier field represents 1543 a valid unicast IP host address. 1545 If one of the Optional Parameters in the OPEN message is not recog- 1546 nized, then the Error Subcode MUST be set to Unsupported Optional 1547 Parameters. 1549 If one of the Optional Parameters in the OPEN message is recognized, 1550 but is malformed, then the Error Subcode MUST be set to 0 (Unspe- 1551 cific). 1553 6.3 UPDATE message error handling. 1555 All errors detected while processing the UPDATE message MUST be indi- 1556 cated by sending the NOTIFICATION message with Error Code UPDATE Mes- 1557 sage Error. The error subcode elaborates on the specific nature of 1558 the error. 1560 Error checking of an UPDATE message begins by examining the path 1561 attributes. If the Withdrawn Routes Length or Total Attribute Length 1562 is too large (i.e., if Withdrawn Routes Length + Total Attribute 1563 Length + 23 exceeds the message Length), then the Error Subcode MUST 1564 be set to Malformed Attribute List. 1566 If any recognized attribute has Attribute Flags that conflict with 1567 the Attribute Type Code, then the Error Subcode MUST be set to 1568 Attribute Flags Error. The Data field MUST contain the erroneous 1569 attribute (type, length and value). 1571 If any recognized attribute has Attribute Length that conflicts with 1573 RFC DRAFT October 2004 1575 the expected length (based on the attribute type code), then the 1576 Error Subcode MUST be set to Attribute Length Error. The Data field 1577 MUST contain the erroneous attribute (type, length and value). 1579 If any of the mandatory well-known attributes are not present, then 1580 the Error Subcode MUST be set to Missing Well-known Attribute. The 1581 Data field MUST contain the Attribute Type Code of the missing well- 1582 known attribute. 1584 If any of the mandatory well-known attributes are not recognized, 1585 then the Error Subcode MUST be set to Unrecognized Well-known 1586 Attribute. The Data field MUST contain the unrecognized attribute 1587 (type, length and value). 1589 If the ORIGIN attribute has an undefined value, then the Error Sub- 1590 code MUST be set to Invalid Origin Attribute. The Data field MUST 1591 contain the unrecognized attribute (type, length and value). 1593 If the NEXT_HOP attribute field is syntactically incorrect, then the 1594 Error Subcode MUST be set to Invalid NEXT_HOP Attribute. The Data 1595 field MUST contain the incorrect attribute (type, length and value). 1596 Syntactic correctness means that the NEXT_HOP attribute represents a 1597 valid IP host address. 1599 The IP address in the NEXT_HOP MUST meet the following criteria to be 1600 considered semantically correct: 1602 a) It MUST NOT be the IP address of the receiving speaker 1604 b) In the case of an EBGP where the sender and receiver are one IP 1605 hop away from each other, either the IP address in the NEXT_HOP 1606 MUST be the sender's IP address (that is used to establish the BGP 1607 connection), or the interface associated with the NEXT_HOP IP 1608 address MUST share a common subnet with the receiving BGP speaker. 1610 If the NEXT_HOP attribute is semantically incorrect, the error SHOULD 1611 be logged, and the route SHOULD be ignored. In this case, a NOTIFICA- 1612 TION message SHOULD NOT be sent, and connection SHOULD NOT be closed. 1614 The AS_PATH attribute is checked for syntactic correctness. If the 1615 path is syntactically incorrect, then the Error Subcode MUST be set 1616 to Malformed AS_PATH. 1618 If the UPDATE message is received from an external peer, the local 1619 system MAY check whether the leftmost (with respect to the position 1620 of octets in the protocol message) AS in the AS_PATH attribute is 1621 equal to the autonomous system number of the peer that sent the mes- 1622 sage. If the check determines that this is not the case, the Error 1624 RFC DRAFT October 2004 1626 Subcode MUST be set to Malformed AS_PATH. 1628 If an optional attribute is recognized, then the value of this 1629 attribute MUST be checked. If an error is detected, the attribute 1630 MUST be discarded, and the Error Subcode MUST be set to Optional 1631 Attribute Error. The Data field MUST contain the attribute (type, 1632 length and value). 1634 If any attribute appears more than once in the UPDATE message, then 1635 the Error Subcode MUST be set to Malformed Attribute List. 1637 The NLRI field in the UPDATE message is checked for syntactic valid- 1638 ity. If the field is syntactically incorrect, then the Error Subcode 1639 MUST be set to Invalid Network Field. 1641 If a prefix in the NLRI field is semantically incorrect (e.g., an 1642 unexpected multicast IP address), an error SHOULD be logged locally, 1643 and the prefix SHOULD be ignored. 1645 An UPDATE message that contains correct path attributes, but no NLRI, 1646 SHALL be treated as a valid UPDATE message. 1648 6.4 NOTIFICATION message error handling. 1650 If a peer sends a NOTIFICATION message, and the receiver of the mes- 1651 sage detects an error in that message, the receiver can not use a 1652 NOTIFICATION message to report this error back to the peer. Any such 1653 error, such as an unrecognized Error Code or Error Subcode, SHOULD be 1654 noticed, logged locally, and brought to the attention of the adminis- 1655 tration of the peer. The means to do this, however, lies outside the 1656 scope of this document. 1658 6.5 Hold Timer Expired error handling. 1660 If a system does not receive successive KEEPALIVE and/or UPDATE 1661 and/or NOTIFICATION messages within the period specified in the Hold 1662 Time field of the OPEN message, then the NOTIFICATION message with 1663 Hold Timer Expired Error Code is sent and the BGP connection is 1664 closed. 1666 RFC DRAFT October 2004 1668 6.6 Finite State Machine error handling. 1670 Any error detected by the BGP Finite State Machine (e.g., receipt of 1671 an unexpected event) is indicated by sending the NOTIFICATION message 1672 with Error Code Finite State Machine Error. 1674 6.7 Cease. 1676 In absence of any fatal errors (that are indicated in this section), 1677 a BGP peer MAY choose at any given time to close its BGP connection 1678 by sending the NOTIFICATION message with Error Code Cease. However, 1679 the Cease NOTIFICATION message MUST NOT be used when a fatal error 1680 indicated by this section does exist. 1682 A BGP speaker MAY support the ability to impose an (locally config- 1683 ured) upper bound on the number of address prefixes the speaker is 1684 willing to accept from a neighbor. When the upper bound is reached, 1685 the speaker (under control of local configuration) either (a) dis- 1686 cards new address prefixes from the neighbor (while maintaining BGP 1687 connection with the neighbor), or (b) terminates the BGP connection 1688 with the neighbor. If the BGP speaker decides to terminate its BGP 1689 connection with a neighbor because the number of address prefixes 1690 received from the neighbor exceeds the locally configured upper 1691 bound, then the speaker MUST send to the neighbor a NOTIFICATION mes- 1692 sage with the Error Code Cease. The speaker MAY also log this 1693 locally. 1695 6.8 BGP connection collision detection. 1697 If a pair of BGP speakers try simultaneously to establish a BGP con- 1698 nection to each other, then two parallel connections between this 1699 pair of speakers might well be formed. If the source IP address used 1700 by one of these connections is the same as the destination IP address 1701 used by the other, and the destination IP address used by the first 1702 connection is the same as the source IP address used by the other, we 1703 refer to this situation as connection collision. Clearly in the 1704 presence of connection collision, one of these connections MUST be 1705 closed. 1707 Based on the value of the BGP Identifier a convention is established 1708 for detecting which BGP connection is to be preserved when a colli- 1709 sion does occur. The convention is to compare the BGP Identifiers of 1710 the peers involved in the collision and to retain only the connection 1712 RFC DRAFT October 2004 1714 initiated by the BGP speaker with the higher-valued BGP Identifier. 1716 Upon receipt of an OPEN message, the local system MUST examine all of 1717 its connections that are in the OpenConfirm state. A BGP speaker MAY 1718 also examine connections in an OpenSent state if it knows the BGP 1719 Identifier of the peer by means outside of the protocol. If among 1720 these connections there is a connection to a remote BGP speaker whose 1721 BGP Identifier equals the one in the OPEN message, and this connec- 1722 tion collides with the connection over which the OPEN message is 1723 received then the local system performs the following collision reso- 1724 lution procedure: 1726 1. The BGP Identifier of the local system is compared to the BGP 1727 Identifier of the remote system (as specified in the OPEN mes- 1728 sage). Comparing BGP Identifiers is done by converting them to 1729 host byte order and treating them as (4-octet long) unsigned inte- 1730 gers. 1732 2. If the value of the local BGP Identifier is less than the 1733 remote one, the local system closes the BGP connection that 1734 already exists (the one that is already in the OpenConfirm state), 1735 and accepts the BGP connection initiated by the remote system. 1737 3. Otherwise, the local system closes newly created BGP connection 1738 (the one associated with the newly received OPEN message), and 1739 continues to use the existing one (the one that is already in the 1740 OpenConfirm state). 1742 Unless allowed via configuration, a connection collision with an 1743 existing BGP connection that is in Established state causes closing 1744 of the newly created connection. 1746 Note that a connection collision can not be detected with connections 1747 that are in Idle, or Connect, or Active states. 1749 Closing the BGP connection (that results from the collision resolu- 1750 tion procedure) is accomplished by sending the NOTIFICATION message 1751 with the Error Code Cease. 1753 7. BGP Version Negotiation 1755 BGP speakers MAY negotiate the version of the protocol by making mul- 1756 tiple attempts to open a BGP connection, starting with the highest 1757 version number each supports. If an open attempt fails with an Error 1758 Code OPEN Message Error, and an Error Subcode Unsupported Version 1759 Number, then the BGP speaker has available the version number it 1761 RFC DRAFT October 2004 1763 tried, the version number its peer tried, the version number passed 1764 by its peer in the NOTIFICATION message, and the version numbers that 1765 it supports. If the two peers do support one or more common versions, 1766 then this will allow them to rapidly determine the highest common 1767 version. In order to support BGP version negotiation, future versions 1768 of BGP MUST retain the format of the OPEN and NOTIFICATION messages. 1770 8. BGP Finite State machine (FSM) 1772 The data structures and FSM described in this document are 1773 conceptual and do not have to be implemented precisely as described 1774 here, as long as the implementations support the described 1775 functionality and their externally visible behavior is the same. 1777 This section specifies the BGP operation in terms of a Finite State 1778 Machine (FSM). The section falls into 2 parts: 1780 1) Description of Events for the State machine (Section 8.1) 1781 2) Description of the FSM (Section 8.2) 1783 Session attributes required (mandatory) for each connection are: 1785 1) State 1786 2) ConnectRetryCounter 1787 3) ConnectRetryTimer 1788 4) ConnectRetryTime 1789 5) HoldTimer 1790 6) HoldTime 1791 7) KeepaliveTimer 1792 8) KeepaliveTime 1794 The state session attribute indicates what state the BGP FSM 1795 is in. The ConnectRetryCounter indicates the number of times 1796 a BGP peer has tried to establish a peer session. 1798 The mandatory attributes related to timers are described in 1799 section 10. Each timer has a "timer" and a "time" (the initial 1800 value). 1802 The optional Session attributes are listed below. These optional 1803 attributes may be supported either per connection or per local sys- 1804 tem: 1806 1) AcceptConnectionsUnconfiguredPeers 1807 2) AllowAutomaticStart 1808 3) AllowAutomaticStop 1810 RFC DRAFT October 2004 1812 4) CollisionDetectEstablishedState 1813 5) DampPeerOscillations 1814 6) DelayOpen 1815 7) DelayOpenTime 1816 8) DelayOpenTimer 1817 9) IdleHoldTime 1818 10) IdleHoldTimer 1819 11) PassiveTcpEstablishment 1820 12) SendNOTIFICATIONwithoutOPEN 1821 13) TrackTcpState 1823 The optional session attributes support different features of the BGP 1824 functionality that have implications for the BGP FSM state 1825 transitions. Two groups of the attributes which relate to timers are: 1826 group 1: DelayOpen, DelayOpenTime, DelayOpenTimer 1827 group 2: DampPeerOscillations, IdleHoldTime, IdleHoldTimer 1829 The first parameter (DelayOpen, DampPeerOscillations) is an 1830 optional attribute that indicates that the Timer function is 1831 active. The "Time" value specifies the initial value for "Timer" 1832 (DelayOpenTime, IdleHoldTime). The "Timer" specifies the actual timer. 1834 Please refer to section 8.1.1 for an explanation 1835 of the interaction between these optional attributes and the events 1836 signaled to the state machine. Section 8.2.1.3 also provides 1837 a short overview of the different types of optional attributes 1838 (flags or timers). 1840 8.1 Events for the BGP FSM 1842 8.1.1 Optional Events linked to Optional Session attributes 1844 The Inputs to the BGP FSM are events. Events can either be 1845 mandatory or optional. Some optional events are linked to 1846 optional session attributes. Optional session attributes enable 1847 several groups of FSM functionality. 1849 The description below describes the linkage between FSM 1850 functionality, events and the optional session attributes. 1852 RFC DRAFT October 2004 1854 Group 1: Automatic Administrative Events (Start/Stop) 1856 Optional Session Attributes: AllowAutomaticStart, AllowAutomaticStop, 1857 DampPeerOscillations, IdleHoldTime, 1858 IdleHoldTimer 1860 Option 1: AllowAutomaticStart 1862 Description: A BGP peer connection can be started and stopped 1863 by administrative control. This administrative 1864 control can either be manual, based on 1865 operator intervention, or under the control 1866 of logic specific to a BGP implementation. 1867 The term "automatic" refers to a start being 1868 issued to the BGP peer connection FSM when 1869 such logic determines that the BGP peer 1870 connection should be restarted. 1872 The AllowAutomaticStart attribute specifies 1873 that this BGP connection supports automatic 1874 starting of the BGP connection. 1876 If the BGP implementation supports 1877 AllowAutomaticStart, the peer may be 1878 repeatedly restarted. Three other options 1879 control the rate at which the automatic 1880 restart occurs: DampPeerOscillations, 1881 IdleHoldTime, and the IdleHoldTimer. 1883 The DampPeerOscillations option specifies 1884 that the implementation engages additional 1885 logic to damp the oscillations of BGP peers 1886 in the face of sequences of automatic start 1887 and automatic stop. IdleHoldTime specifies 1888 how long the BGP peer is held in the Idle 1889 state prior to allowing the next automatic 1890 restart. The IdleHoldTimer is the timer 1891 that runs to hold the peer in Idle state. 1893 An example of DampPeerOscillations logic 1894 is an increase of the IdleHoldTime value 1895 if a BGP peer oscillates connectivity 1896 (connected/disconnected) repeatedly 1897 within a time period. To engage this 1898 logic, a peer could connect and disconnect 1899 10 times within 5 minutes. The IdleHoldTime 1900 value would be reset from 0 to 120 seconds. 1902 RFC DRAFT October 2004 1904 Values: TRUE or FALSE 1906 Option 2: AllowAutomaticStop 1908 Description: This BGP peer session optional attribute 1909 indicates that the BGP connection allows 1910 "automatic" stopping of the BGP connection. 1911 An "automatic" stop is defined as a stop under 1912 the control of implementation specific logic. 1913 The implementation specific logic is outside 1914 the scope of this specification. 1916 Values: TRUE or FALSE 1918 Option 3: DampPeerOscillations 1920 Description: The DampPeerOscillations optional session 1921 attribute indicates that this BGP connection 1922 is using logic that damps BGP peer oscillations 1923 in the Idle State. 1925 Value: TRUE or FALSE 1927 Option 4: IdleHoldTime 1929 Description: The IdleHoldTime is the value 1930 that is set in the IdleHoldTimer. 1932 Values: Time in seconds 1934 Option 5: IdleHoldTimer 1936 Description: The IdleHoldTimer aids in controlling BGP peer 1937 oscillation. The IdleHoldTimer is used to keep 1938 the BGP peer in Idle for a particular duration. 1939 The IdleHoldTimer_Expires event is described 1940 in section 8.1.3. 1942 Values: Time in seconds 1944 Group 2: Unconfigured Peers 1946 Optional Session Attributes: AcceptConnectionsUnconfiguredPeers 1948 RFC DRAFT October 2004 1950 Option 1: AcceptConnectionsUnconfiguredPeers 1952 Description: The BGP FSM optionally allows the acceptance of BGP 1953 peer connections from neighbors that are not 1954 pre-configured. The 1955 "AcceptConnectionsUnconfiguredPeers" optional 1956 session attribute allows the FSM to support 1957 the state transitions that allow the 1958 implementation to accept or reject these 1959 unconfigured peers. 1961 The AcceptConnectionsUnconfiguredPeers has 1962 security implications. Please refer to the 1963 BGP Vulnerabilities document[BGP_VULN] for 1964 details. 1966 Value: True or False 1968 Group 3: TCP processing 1970 Optional Session Attributes: PassiveTcpEstablishment, TrackTcpState 1972 Option 1: PassiveTcpEstablishment 1974 Description: This option indicates that the BGP FSM will passively 1975 wait for the remote BGP peer to establish the BGP 1976 TCP connection. 1978 value: TRUE or FALSE 1980 Option 2: TrackTcpState 1982 Description: The BGP FSM normally tracks the end result of a TCP 1983 connection attempt rather than individual TCP messages. 1984 Optionally, the BGP FSM can support additional 1985 interaction with the TCP connection negotiation. The 1986 interaction with the TCP events may increase the 1987 amount of logging the BGP peer connection 1988 requires and the number of BGP FSM changes. 1990 Value: TRUE or FALSE 1992 Group 4: BGP Message Processing 1994 Optional Session Attributes: DelayOpen, DelayOpenTime, 1995 DelayOpenTimer, 1997 RFC DRAFT October 2004 1999 SendNOTIFICATIONwithoutOPEN, 2000 CollisionDetectEstablishedState 2002 Option 1: DelayOpen 2004 Description: The DelayOpen optional session attribute allows 2005 implementations to be configured to delay 2006 sending an OPEN message for a specific time 2007 period (DelayOpenTime). The delay allows 2008 the remote BGP Peer time to send the first 2009 OPEN message. 2011 Value: TRUE or FALSE 2013 Option 2: DelayOpenTime 2015 Description: The DelayOpenTime is the initial value that is 2016 set in the DelayOpenTimer. 2018 Value: Time in seconds 2020 Option 3: DelayOpenTimer 2022 Description: The DelayOpenTimer optional session attribute 2023 is used to delay the sending of an OPEN message 2024 on a connection. The DelayOpenTimer_Expires event 2025 (Event 12) is described in section 8.1.3. 2027 Value: Time in seconds 2029 Option 4: SendNOTIFICATIONwithoutOPEN 2031 Description: The SendNOTIFICATIONwithoutOPEN allows a peer to 2032 send a NOTIFICATION without first sending an 2033 OPEN message. Without this optional session 2034 attribute, the BGP connection assumes that an 2035 OPEN message must be sent by a peer prior 2036 to the peer sending a NOTIFICATION message. 2038 Value: True or False 2040 Option 5: CollisionDetectEstablishedState 2042 Description: Normally, a Detect Collision (6.8) will 2043 be ignored in the Established state. This 2045 RFC DRAFT October 2004 2047 optional session attribute indicates that 2048 this BGP connection processes 2049 collisions in the Established state. 2051 Value: True or False 2053 Note: The optional session attributes clarify the BGP FSM description 2054 for existing features of BGP implementations. The optional 2055 session attributes may be pre-defined for an implementation 2056 and not readable via management interfaces for existing 2057 correct implementations. As newer BGP MIBs (version 2 2058 and beyond) are supported, these fields will be accessible 2059 via a management interface. 2061 8.1.2 Administrative Events 2063 An administrative event is an event in which the operator interface 2064 and BGP Policy engine signal the BGP finite state machine to start or 2065 stop the BGP state machine. The basic start and stop indication are 2066 augmented by optional connection attributes to signal a certain type 2067 of start or stop mechanism to the BGP FSM. An example of this combi- 2068 nation is Event 5, AutomaticStart_with_PassiveTcpEstablishment. With 2069 this event, the BGP implementation signals to the BGP FSM that the 2070 implementation is using an Automatic Start with option to use a Pas- 2071 sive TCP Establishment. The Passive TCP establishment signals that 2072 this BGP FSM will wait for the remote side to start the TCP estab- 2073 lishment. 2075 Please note that only Event 1 (ManualStart) and Event 2 (ManualStop) 2076 are mandatory administrative events. All other administrative events 2077 are optional (Events 3-8). Each event below has a name, definition, 2078 status (mandatory or optional), and what optional session attributes 2079 SHOULD be set at each stage. When generating Event 1 through Event 8 2080 for the BGP FSM, the conditions specified in the "Optional Attribute 2081 Status" section are verified. If any of these conditions are not 2082 satisfied, then the local system should log a FSM error. 2084 The settings of optional session attributes may be implicit in some 2085 implementations and therefore may not be set explicitly by an exter- 2086 nal operator action. Section 8.2.1.5 describes these implicit set- 2087 tings of the optional session attributes. The administrative states 2088 described below may also be implicit in some implementations and not 2089 directly configurable by an external operator. 2091 RFC DRAFT October 2004 2093 Event 1: ManualStart 2095 Definition: Local system administrator manually starts peer 2096 connection. 2098 Status: Mandatory 2100 Optional 2101 Attribute 2102 Status: The PassiveTcpEstablishment attribute SHOULD be 2103 set to FALSE. 2105 Event 2: ManualStop 2107 Definition: Local system administrator manually 2108 stops the peer connection. 2110 Status: Mandatory 2112 Optional 2113 Attribute 2114 Status: No interaction with any optional attributes. 2116 Event 3: AutomaticStart 2118 Definition: Local system automatically starts the 2119 BGP connection. 2121 Status: Optional, depending on local system 2123 Optional 2124 Attribute 2125 Status: 1) The AllowAutomaticStart attribute SHOULD be set 2126 to TRUE if this event occurs. 2127 2) If the PassiveTcpEstablishment optional session 2128 attribute is supported, it SHOULD be set to FALSE. 2129 3) If the DampPeerOscillations is supported, it 2130 SHOULD be set to FALSE when this event occurs. 2132 Event 4: ManualStart_with_PassiveTcpEstablishment 2134 Definition: Local system administrator manually starts peer 2135 connection, but has PassiveTcpEstablishment 2136 enabled. The PassiveTcpEstablishment optional 2137 attribute indicates that the peer will listen prior 2138 to establishing the connection. 2140 RFC DRAFT October 2004 2142 Status: Optional, depending on local system 2144 Optional 2145 Attribute 2146 Status: 1) The PassiveTcpEstablishment attribute SHOULD 2147 be set to TRUE if this event occurs. 2148 2) The DampPeerOscillations attribute SHOULD be 2149 set to FALSE when this event occurs. 2151 Event 5: AutomaticStart_with_PassiveTcpEstablishment 2153 Definition: Local system automatically starts the 2154 BGP connection with the PassiveTcpEstablishment 2155 enabled. The PassiveTcpEstablishment 2156 optional attribute indicates 2157 that the peer will listen prior to 2158 establishing a connection. 2160 Status: Optional, depending on local system 2162 Optional 2163 Attribute 2164 Status: 1) The AllowAutomaticStart attribute SHOULD 2165 be set to TRUE. 2166 2) The PassiveTcpEstablishment attribute SHOULD 2167 be set to TRUE 2168 3) If the DampPeerOscillations attribute is 2169 supported, the DampPeerOscillations SHOULD 2170 be set to FALSE. 2172 Event 6: AutomaticStart_with_DampPeerOscillations 2174 Definition: Local system automatically starts the 2175 BGP peer connection with peer oscillation 2176 damping enabled. The exact method of damping 2177 persistent peer oscillations is left up to the 2178 implementation and is outside the scope of 2179 this document. 2181 Status: Optional, depending on local system. 2183 Optional 2184 Attribute 2185 Status: 1) The AllowAutomaticStart attribute SHOULD 2186 be set to TRUE. 2188 RFC DRAFT October 2004 2190 2) The DampPeerOscillations attribute SHOULD 2191 be set to TRUE. 2192 3) The PassiveTcpEstablishment attribute 2193 SHOULD be set to FALSE. 2195 Event 7: AutomaticStart_with_DampPeerOscillations_and_ 2196 PassiveTcpEstablishment 2198 Definition: Local system automatically starts the 2199 BGP peer connection with peer oscillation 2200 damping enabled and PassiveTcpEstablishment 2201 enabled. The exact method of damping 2202 persistent peer oscillations is left up to the 2203 implementation and is outside the scope of 2204 this document. 2206 Status: Optional, depending on local system 2208 Optional 2209 Attributes 2210 Status: 1) The AllowAutomaticStart attribute 2211 SHOULD be set to TRUE. 2212 2) The DampPeerOscillations attribute SHOULD 2213 be set to TRUE. 2214 3) The PassiveTcpEstablishment attribute 2215 SHOULD be set to TRUE. 2217 Event 8: AutomaticStop 2219 Definition: Local system automatically stops the 2220 BGP connection. 2222 An example of an automatic stop event is 2223 exceeding the number of prefixes for a given 2224 peer and the local system automatically 2225 disconnecting the peer. 2227 Status: Optional, depending on local system 2229 Optional 2230 Attribute 2231 Status: 1) The AllowAutomaticStop attribute 2232 SHOULD be TRUE 2234 RFC DRAFT October 2004 2236 8.1.3 Timer Events 2238 Event 9: ConnectRetryTimer_Expires 2240 Definition: An event generated when the ConnectRetryTimer 2241 expires. 2243 Status: Mandatory 2245 Event 10: HoldTimer_Expires 2247 Definition: An event generated when the HoldTimer expires. 2249 Status: Mandatory 2251 Event 11: KeepaliveTimer_Expires 2253 Definition: An event generated when the KeepaliveTimer expires. 2254 Status: Mandatory 2256 Event 12: DelayOpenTimer_Expires 2258 Definition: An event generated when the DelayOpenTimer expires. 2260 Status: Optional 2262 Optional 2263 Attribute 2264 Status: If this event occurs, 2265 1) DelayOpen attribute SHOULD be set to TRUE, 2266 2) DelayOpenTime attribute SHOULD be supported, 2267 3) DelayOpenTimer SHOULD be supported, 2269 Event 13: IdleHoldTimer_Expires 2271 Definition: An event generated when the IdleHoldTimer 2272 expires indicating that the BGP connection has 2273 completed waiting for the back-off period 2274 to prevent BGP peer oscillation. 2276 The IdleHoldTimer is only used when the 2277 persistent peer oscillation damping 2278 function is enabled by setting the 2279 DampPeerOscillations optional attribute 2281 RFC DRAFT October 2004 2283 to TRUE. 2285 Implementations not implementing the 2286 persistent peer oscillation damping 2287 function may not have the IdleHoldTimer. 2289 Status: Optional 2291 Optional 2292 Attribute 2293 Status: If this event occurs: 2294 1) DampPeerOscillations attribute SHOULD be set 2295 to TRUE. 2296 2) IdleHoldTimer SHOULD have just expired. 2298 8.1.4 TCP Connection based Events 2300 Event 14: TcpConnection_Valid 2302 Definition: Event indicating the local system reception of 2303 a TCP connection request with a valid 2304 source IP address and TCP port and a valid 2305 destination IP address and TCP Port. The 2306 definition of invalid source and invalid 2307 destination IP address is left to the 2308 implementation. 2310 BGP's destination port SHOULD be port 179 2311 as defined by IANA. 2313 TCP connection request is denoted by the 2314 local system receiving a TCP SYN. 2316 Status: Optional 2318 Optional 2319 Attribute 2320 Status: 1) The TrackTcpState attribute SHOULD be set to 2321 TRUE if this event occurs. 2323 Event 15: Tcp_CR_Invalid 2325 Definition: Event indicating the local system reception 2326 of a TCP connection request with either 2328 RFC DRAFT October 2004 2330 an invalid source address or port 2331 number or an invalid destination 2332 address or port number. 2334 BGP destination port number SHOULD be 179 2335 as defined by IANA. 2337 A TCP connection request occurs when 2338 the local system receives a TCP 2339 SYN. 2341 Status: Optional 2343 Optional 2344 Attribute 2345 Status: 1) The TrackTcpState attribute should be set to 2346 TRUE if this event occurs. 2348 Event 16: Tcp_CR_Acked 2350 Definition: Event indicating the local system's request 2351 to establish a TCP connection to the remote 2352 peer. 2354 The local system's TCP connection sent a TCP 2355 SYN, and received a TCP SYN/ACK message, 2356 and sent a TCP ACK. 2358 Status: Mandatory 2360 Event 17: TcpConnectionConfirmed 2362 Definition: Event indicating that the local system has 2363 received a confirmation that the TCP 2364 connection has been established by the 2365 remote site. 2367 The remote peer's TCP engine sent a TCP SYN. 2368 The local peer's TCP engine sent a SYN, ACK 2369 message and now has received a final ACK. 2371 Status: Mandatory 2373 Event 18: TcpConnectionFails 2375 Definition: Event indicating that the local system has 2376 received a TCP connection failure notice. 2378 RFC DRAFT October 2004 2380 The remote BGP peer's TCP machine could have 2381 sent a FIN. The local peer would respond 2382 with a FIN-ACK. Another alternative is that 2383 the local peer indicated a timeout in the 2384 TCP connection and downed the connection. 2386 Status: Mandatory 2388 8.1.5 BGP Message-based Events 2390 Event 19: BGPOpen 2392 Definition: An event is generated when a valid OPEN 2393 message has been received. 2395 Status: Mandatory 2397 Optional 2398 Attribute 2399 Status: 1) The DelayOpen optional attribute SHOULD 2400 be set to FALSE. 2401 2) The DelayOpenTimer SHOULD not be running. 2403 Event 20: BGPOpen with DelayOpenTimer running 2405 Definition: An event is generated when a valid OPEN 2406 message has been received for a peer 2407 that has a successfully established 2408 transport connection and is currently 2409 delaying the sending of a BGP open 2410 message. 2412 Status: Optional 2414 Optional 2415 Attribute 2416 Status: 1) The DelayOpen attribute SHOULD be 2417 set to TRUE. 2418 2) The DelayOpenTimer SHOULD be running. 2420 Event 21: BGPHeaderErr 2422 RFC DRAFT October 2004 2424 Definition: An event is generated when a received 2425 BGP message header is not valid. 2427 Status: Mandatory 2429 Event 22: BGPOpenMsgErr 2431 Definition: An event is generated when an OPEN message 2432 has been received with errors. 2434 Status: Mandatory 2436 Event 23: OpenCollisionDump 2438 Definition: An event generated administratively 2439 when a connection collision has been 2440 detected while processing an incoming 2441 OPEN message and this connection has been 2442 selected to be disconnected. See section 2443 6.8 for more information on collision 2444 detection. 2446 Event 23 is an administrative action 2447 generated by implementation logic 2448 that determines that this connection 2449 needs to be dropped per the rules in 2450 section 6.8. This event may occur if the FSM 2451 is implemented as two linked state machines. 2453 Status: Optional 2455 Optional 2456 Attribute 2457 Status: If the state machine is to process this 2458 event in Established state, 2459 1) CollisionDetectEstablishedState 2460 optional attribute SHOULD be set to TRUE 2462 Please note: The OpenCollisionDump event can occur 2463 in Idle, Connect, Active, OpenSent, OpenConfirm 2464 without any optional attributes being set. 2466 Event 24: NotifMsgVerErr 2468 Definition: An event is generated when a 2470 RFC DRAFT October 2004 2472 NOTIFICATION message with "version 2473 error" is received. 2475 Status: Mandatory 2477 Event 25: NotifMsg 2479 Definition: An event is generated when a 2480 NOTIFICATION message is received and 2481 the error code is anything but 2482 "version error". 2484 Status: Mandatory 2486 Event 26: KeepAliveMsg 2488 Definition: An event is generated when a KEEPALIVE 2489 message is received. 2491 Status: Mandatory 2493 Event 27: UpdateMsg 2495 Definition: An event is generated when a valid 2496 UPDATE message is received. 2498 Status: Mandatory 2500 Event 28: UpdateMsgErr 2502 Definition: An event is generated when an invalid 2503 UPDATE message is received. 2505 Status: Mandatory 2507 8.2 Description of FSM 2509 8.2.1 FSM Definition 2511 BGP MUST maintain a separate FSM for each configured peer. Each BGP 2512 peer paired in a potential connection, unless configured to remain in 2513 the idle state, or configured to remain passive, will attempt to con- 2514 nect to the other. For the purpose of this discussion, the active or 2516 RFC DRAFT October 2004 2518 connecting side of the TCP connection (the side of a TCP connection 2519 sending the first TCP SYN packet) is called outgoing. The passive or 2520 listening side (the sender of the first SYN/ACK) is called an incom- 2521 ing connection. (See Section 8.2.1.1 for information on the terms 2522 active and passive used below.) 2524 A BGP implementation MUST connect to and listen on TCP port 179 for 2525 incoming connections in addition to trying to connect to peers. For 2526 each incoming connection, a state machine MUST be instantiated. 2527 There exists a period in which the identity of the peer on the other 2528 end of an incoming connection is known, but the BGP identifier is not 2529 known. During this time, both an incoming and an outgoing connection 2530 for the same configured peering may exist. This is referred to as a 2531 connection collision. (See Section 6.8.) 2533 A BGP implementation will have at most one FSM for each configured 2534 peering plus one FSM for each incoming TCP connection for which the 2535 peer has not yet been identified. Each FSM corresponds to exactly one 2536 TCP connection. 2538 There may be more than one connection between a pair of peers if the 2539 connections are configured to use a different pair of IP addresses. 2540 This is referred to as multiple "configured peerings" to the same 2541 peer. 2543 8.2.1.1 Terms "active" and "passive" 2545 The terms active and passive have been in the Internet operator's 2546 vocabulary for almost a decade and have proven useful. The words 2547 active and passive have slightly different meanings applied to a TCP 2548 connection or applied to a peer. There is only one active side and 2549 one passive side to any one TCP connection per the definition above 2550 and the state machine below. When a BGP speaker is configured active, 2551 it may end up on either the active or passive side of the connection 2552 that eventually gets established. Once the TCP connection is com- 2553 pleted, it doesn't matter which end was active and which end was pas- 2554 sive. The only difference is which side of the TCP connection has 2555 port number 179. 2557 8.2.1.2 FSM and collision detection 2559 There is one FSM per BGP connection. When the connection collision 2560 occurs prior to determining what peer a connection is associated 2562 RFC DRAFT October 2004 2564 with, there may be two connections for one peer. After the connec- 2565 tion collision is resolved (see Section 6.8) the FSM for the connec- 2566 tion that is closed SHOULD be disposed of. 2568 8.2.1.3 FSM and Optional Session Attributes 2570 Optional Session Attributes specify either attributes that act 2571 as flags (TRUE or FALSE) or optional timers. For optional 2572 attributes that act as flags, if the optional session attribute 2573 can be set to TRUE on the system, the corresponding the BGP FSM 2574 actions must be supported. For example, if the following options 2575 can be set in a BGP implementation: AutoStart and 2576 PassiveTcpEstablishment, then the Events 3, 4 and 5 must be 2577 supported. If an Optional Session attribute cannot be set to 2578 TRUE, the events supporting that set of options do not have to 2579 be supported. 2581 Each of the optional timers (DelayOpenTimer and IdleHoldTimer), 2582 has a group of attributes that are: 2584 - flag indicating support, 2585 - Time set in Timer 2586 - Timer. 2588 The two optional timers show this format: 2590 DelayOpenTimer: DelayOpen, DelayOpenTime, DelayOpenTimer 2591 IdleHoldTimer: DampPeerOscillations, IdleHoldTime, 2592 IdleHoldTimer 2594 If the flag indicating support for an optional timer 2595 (DelayOpen or DampPeerOscillations), cannot be set to TRUE, 2596 the timers and events supporting that 2597 option do not have to be supported. 2599 8.2.1.4 FSM Event numbers 2601 The Event numbers (1-28) utilized in this state machine description 2602 aid in specifying the behavior of the BGP state machine. Implementa- 2603 tions MAY use these numbers to provide network management informa- 2604 tion. The exact form of a FSM or the FSM events are specific to each 2605 implementation. 2607 RFC DRAFT October 2004 2609 8.2.1.5 FSM actions that are implementation dependent. 2611 The BGP FSM specifies at certain points that BGP initialization will 2612 occur or that BGP resources will be deleted. The initialization of 2613 the BGP FSM and the associated resources depend on the policy portion 2614 of the BGP implementation. The details of these actions are outside 2615 the scope of the FSM document. 2617 8.2.2 Finite State Machine 2619 Idle state: 2621 Initially the BGP peer FSM is in the Idle state. (Hereafter 2622 the BGP peer FSM will be shortened to BGP FSM.) 2624 In this state BGP FSM refuses all incoming BGP 2625 connections for this peer. No resources are allocated to the peer. 2626 In response to a ManualStart event (Event 1) or an 2627 AutomaticStart event (Event 3), the local system: 2628 - initializes all BGP resources for the peer connection, 2629 - sets ConnectRetryCounter to zero, 2630 - starts the ConnectRetryTimer with initial value, 2631 - initiates a TCP connection to the other BGP peer, 2632 - listens for a connection that may be initiated by 2633 the remote BGP peer, and 2634 - changes its state to Connect. 2636 The ManualStop event (Event 2) and AutomaticStop (Event 8) event 2637 are ignored in the Idle state. 2639 In response to a ManualStart_with_PassiveTcpEstablishment event 2640 (Event 4) or AutomaticStart_with_PassiveTcpEstablishment event 2641 (Event 5), the local system: 2642 - initializes all BGP resources, 2643 - sets the ConnectRetryCounter to zero, 2644 - starts the ConnectRetryTimer with initial value, 2645 - listens for a connection that may be initiated by 2646 the remote peer, and 2647 - changes its state to Active. 2649 The exact value of the ConnectRetryTimer is a local 2650 matter, but it SHOULD be sufficiently large to allow TCP 2651 initialization. 2653 If the DampPeerOscillations attribute is set to TRUE, 2655 RFC DRAFT October 2004 2657 the following three additional events may occur 2658 within Idle state: 2659 - AutomaticStart_with_DampPeerOscillations (Event 6), 2660 - AutomaticStart_with_DampPeerOscillations_and_ 2661 PassiveTcpEstablishment (Event 7), 2662 - IdleHoldTimer_Expires (Event 13). 2664 Upon receiving these 3 events, the local system will 2665 use these events to prevent peer oscillations. 2666 The method of preventing persistent peer oscillation is 2667 outside the scope of this document. 2669 Any other event (Events 9-12, 15-28) received in the Idle state 2670 does not cause change in the state of the local system. 2672 Connect State: 2674 In this state, BGP FSM is waiting for the TCP connection to 2675 be completed. 2677 The start events (Events 1, 3-7) are ignored in connect 2678 state. 2680 In response to a ManualStop event (Event 2), the local system: 2681 - drops the TCP connection, 2682 - releases all BGP resources, 2683 - sets ConnectRetryCounter to zero, 2684 - stops the ConnectRetryTimer and sets ConnectRetryTimer 2685 to zero, and 2686 - changes its state to Idle. 2688 In response to the ConnectRetryTimer_Expires event (Event 9), 2689 the local system: 2690 - drops the TCP connection, 2691 - restarts the ConnectRetryTimer, 2692 - stops the DelayOpenTimer and resets the timer to zero, 2693 - initiates a TCP connection to the other BGP peer, 2694 - continues to listen for a connection that may be 2695 initiated by the remote BGP peer, and 2696 - stays in Connect state. 2698 If the DelayOpenTimer_Expires event (Event 12) occurs in the 2699 Connect state, the local system: 2700 - sends an OPEN message to its peer, 2701 - sets the HoldTimer to a large value, and 2703 RFC DRAFT October 2004 2705 - changes its state to OpenSent. 2707 If the BGP FSM receives a TcpConnection_Valid event 2708 (Event 14), the TCP connection is processed, and 2709 the connection remains in the Connect state. 2711 If the BGP FSM receives a Tcp_CR_Invalid event (Event 15), 2712 the local system rejects the TCP connection, and the connection 2713 remains in the Connect state. 2715 If the TCP connection succeeds (Event 16 or Event 17), 2716 the local system checks the DelayOpen attribute prior 2717 to processing. If the DelayOpen attribute is set to TRUE, 2718 the local system: 2719 - stops the ConnectRetryTimer (if running) and sets the 2720 ConnectRetryTimer to zero, 2721 - sets the DelayOpenTimer to the initial value, and 2722 - stays in the Connect state. 2723 If the DelayOpen attribute is set to FALSE, the local system: 2724 - stops the ConnectRetryTimer (if running) and sets the 2725 ConnectRetryTimer to zero, 2726 - completes BGP initialization 2727 - sends an OPEN message to its peer, 2728 - sets HoldTimer to a large value, and 2729 - changes its state to OpenSent. 2731 A HoldTimer value of 4 minutes is suggested. 2733 If the TCP connection fails (Event 18), the local system 2734 checks the DelayOpenTimer. If the DelayOpenTimer is running, 2735 the local system: 2736 - restarts the ConnectRetryTimer with initial value, 2737 - stops the DelayOpenTimer and resets its value to zero, 2738 - continues to listen for a connection that may be 2739 initiated by the remote BGP peer, and 2740 - changes its state to Active. 2742 If the DelayOpenTimer is not running, the local system: 2743 - stops the ConnectRetryTimer to zero, 2744 - drops the TCP connection, 2745 - releases all BGP resources, and 2746 - changes its state to Idle. 2748 If an OPEN message is received while the DelayOpenTimer is 2749 running (Event 20), the local system: 2751 - stops the ConnectRetryTimer (if running) and 2752 sets the ConnectRetryTimer to zero, 2754 RFC DRAFT October 2004 2756 - completes the BGP initialization, 2757 - stops and clears the DelayOpenTimer 2758 (sets the value to zero), 2759 - sends an OPEN message, 2760 - sends a KEEPALIVE message, 2761 - if the HoldTimer initial value is non-zero, 2762 - starts the KeepaliveTimer with the initial value and 2763 - resets the HoldTimer to the negotiated value, 2764 else if HoldTimer initial value is zero, 2765 - resets the KeepaliveTimer and 2766 - resets the HoldTimer value to zero, 2767 - and changes its state to OpenConfirm. 2769 If the value of the autonomous system field is the same as the local 2770 Autonomous System number, set the connection status to an internal 2771 connection; otherwise it is "external". 2773 If BGP message header checking detects an error (Event 21) or 2774 OPEN message checking detects an error (Event 22) (see section 2775 6.2), the local system: 2776 - (optionally) If the SendNOTIFICATIONwithoutOPEN attribute 2777 is set to TRUE, then the local system first sends 2778 a NOTIFICATION message with the appropriate error 2779 code, and then 2781 - stops the ConnectRetryTimer (if running) 2782 and sets the ConnectRetryTimer to zero, 2783 - releases all BGP resources, 2784 - drops the TCP connection, 2785 - increments the ConnectRetryCounter by 1, 2786 - (optionally) performs peer oscillation damping 2787 if the DampPeerOscillations attribute is set to TRUE, and 2788 - changes its state to Idle. 2790 If a NOTIFICATION message is received with a version 2791 error (Event 24), the local system checks the DelayOpenTimer. 2792 If the DelayOpenTimer is running, the local system: 2793 - stops the ConnectRetryTimer (if running) 2794 and sets the ConnectRetryTimer to zero, 2795 - stops and resets the DelayOpenTimer (sets to zero), 2796 - releases all BGP resources, 2797 - drops the TCP connection, and 2798 - changes its state to Idle. 2800 If the DelayOpenTimer is not running, the local system: 2801 - stops the ConnectRetryTimer and sets the 2802 ConnectRetryTimer to zero, 2803 - releases all BGP resources, 2805 RFC DRAFT October 2004 2807 - drops the TCP connection, 2808 - increments the ConnectRetryCounter by 1, 2809 - performs peer oscillation damping if the 2810 DampPeerOscillations attribute is set to True, and 2811 - changes its state to Idle. 2813 In response to any other events (Events 8,10-11,13,19,23, 2814 25-28) the local system: 2815 - if the ConnectRetryTimer is running, 2816 stops and resets the ConnectRetryTimer (sets to zero), 2817 - if the DelayOpenTimer is running, 2818 stops and resets the DelayOpenTimer (sets to zero), 2819 - releases all BGP resources, 2820 - drops the TCP connection, 2821 - increments the ConnectRetryCounter by 1, 2822 - performs peer oscillation damping if the 2823 DampPeerOscillations attribute is set to True, and 2824 - changes its state to Idle. 2826 Active State: 2828 In this state BGP FSM is trying to acquire a peer by listening 2829 for and accepting a TCP connection. 2831 The start events (Events 1, 3-7) are ignored in the Active 2832 state. 2834 In response to a ManualStop event (Event 2), the local system: 2835 - If the DelayOpenTimer is running and the 2836 SendNOTIFICATIONwithoutOPEN session attribute is set, 2837 the local system sends a NOTIFICATION with a Cease, 2838 - releases all BGP resources including 2839 stopping the DelayOpenTimer 2840 - drops the TCP connection, 2841 - sets ConnectRetryCounter to zero, 2842 - stops the ConnectRetryTimer and sets the 2843 ConnectRetryTimer to zero, and 2844 - changes its state to Idle. 2846 In response to a ConnectRetryTimer_Expires event (Event 9), 2847 the local system: 2848 - restarts the ConnectRetryTimer (with initial value), 2849 - initiates a TCP connection to the other BGP peer, 2850 - continues to listen for TCP connection that may be 2851 initiated by remote BGP peer, and 2853 RFC DRAFT October 2004 2855 - changes its state to Connect. 2857 If the local system receives a DelayOpenTimer_Expires event 2858 (Event 12), the local system: 2859 - sets the ConnectRetryTimer to zero, 2860 - stops and clears the DelayOpenTimer (set to zero), 2861 - completes the BGP initialization, 2862 - sends the OPEN message to its remote peer, 2863 - sets its hold timer to a large value, and 2864 - changes its state to OpenSent. 2866 A HoldTimer value of 4 minutes is also suggested for this 2867 state transition. 2869 If the local system receives a TcpConnection_Valid event 2870 (Event 14), the local system processes the TCP connection 2871 flags and stays in Active state. 2873 If the local system receives an Tcp_CR_Invalid event (Event 15): 2874 the local system rejects the TCP connection and stays in 2875 the Active State. 2877 In response to a TCP connection succeeding (Event 16 or Event 17), 2878 the local system checks the DelayOpen optional attribute prior to 2879 processing. 2880 If the DelayOpen attribute is set to TRUE, the local 2881 system: 2882 - stops the ConnectRetryTimer and sets the 2883 ConnectRetryTimer to zero, 2884 - sets the DelayOpenTimer to the initial value 2885 (DelayOpenTime), and 2886 - stays in the Active state. 2887 If the DelayOpen attribute is set to FALSE, the local 2888 system: 2889 - sets the ConnectRetryTimer to zero, 2890 - completes the BGP initialization, 2891 - sends the OPEN message to its peer, 2892 - sets its HoldTimer to a large value, and 2893 - changes its state to OpenSent. 2895 A HoldTimer value of 4 minutes is suggested as a "large value" for 2896 the HoldTimer. 2898 If the local system receives a TcpConnectionFails event (Event 18), 2899 the local system: 2901 RFC DRAFT October 2004 2903 - restarts ConnectRetryTimer (with initial value), 2904 - stops and clears the DelayOpenTimer (sets the value to zero), 2905 - releases all BGP resource, 2906 - increments ConnectRetryCounter by 1, 2907 - optionally performs peer oscillation damping if 2908 the DampPeerOscillations attribute is set to TRUE, and 2909 - changes its state to Idle. 2911 If an OPEN message is received and the DelayOpenTimer is 2912 running (Event 20), the local system: 2913 - stops ConnectRetryTimer (if running) and sets 2914 the ConnectRetryTimer to zero, 2915 - stops and clears DelayOpenTimer (sets to zero), 2916 - completes the BGP initialization, 2917 - sends an OPEN message, 2918 - sends a KEEPALIVE message, 2919 - if the HoldTimer value is non-zero, 2920 - starts the KeepaliveTimer to initial value, 2921 - resets the HoldTimer to the negotiated value, 2922 else if the HoldTimer is zero 2923 - resets the KeepaliveTimer (set to zero), 2924 - resets the HoldTimer to zero, and 2925 - changes its state to OpenConfirm. 2927 If the value of the autonomous system field is the same as 2928 the local Autonomous System number, set the connection status 2929 to an internal connection; otherwise it is external. 2931 If BGP message header checking detects an error (Event 21) 2932 or OPEN message checking detects an error (Event 22) (see 2933 section 6.2), the local system: 2934 - (optionally) sends a NOTIFICATION message with the 2935 appropriate error code if the SendNOTIFICATIONwithoutOPEN 2936 attribute is set to TRUE, 2937 - sets the ConnectRetryTimer to zero, 2938 - releases all BGP resources, 2939 - drops the TCP connection, 2940 - increments the ConnectRetryCounter by 1, 2941 - (optionally) performs peer oscillation damping if the 2942 DampPeerOscillations attribute is set to TRUE, and 2943 - changes its state to Idle. 2945 If a NOTIFICATION message is received with a version 2946 error (Event 24), the local system checks the DelayOpenTimer. 2947 If the DelayOpenTimer is running, the local system: 2948 - stops the ConnectRetryTimer (if running) and 2949 sets the ConnectRetryTimer to zero, 2951 RFC DRAFT October 2004 2953 - stops and resets the DelayOpenTimer (sets to zero), 2954 - releases all BGP resources, 2955 - drops the TCP connection, and 2956 - changes its state to Idle. 2957 If the DelayOpenTimer is not running, the local system: 2958 - sets the ConnectRetryTimer to zero, 2959 - releases all BGP resources, 2960 - drops the TCP connection, 2961 - increments the ConnectRetryCounter by 1, 2962 - (optionally) performs peer oscillation damping 2963 if the DampPeerOscillations attribute is set to TRUE, and 2964 - changes its state to Idle. 2966 In response to any other event (Events 8,10-11,13,19,23,25-28), 2967 the local system: 2968 - sets the ConnectRetryTimer to zero, 2969 - releases all BGP resources, 2970 - drops the TCP connection, 2971 - increments the ConnectRetryCounter by one, 2972 - (optionally) performs peer oscillation damping if 2973 the DampPeerOscillations attribute is set to TRUE, and 2974 - changes its state to Idle. 2976 OpenSent: 2978 In this state BGP FSM waits for an OPEN message from its peer. 2980 The start events (Events 1, 3-7) are ignored in the OpenSent 2981 state. 2983 If a ManualStop event (Event 2) is issued in OpenSent 2984 state, the local system: 2985 - sends the NOTIFICATION with a cease, 2986 - sets the ConnectRetryTimer to zero, 2987 - releases all BGP resources, 2988 - drops the TCP connection, 2989 - sets the ConnectRetryCounter to zero, and 2990 - changes its state to Idle. 2992 If an AutomaticStop event (Event 8) is issued in OpenSent 2993 state, the local system: 2994 - sends the NOTIFICATION with a cease, 2995 - sets the ConnectRetryTimer to zero, 2996 - releases all the BGP resources, 2997 - drops the TCP connection, 2998 - increments the ConnectRetryCounter by 1, 2999 - (optionally) performs peer oscillation damping if the 3001 RFC DRAFT October 2004 3003 DampPeerOscillations attribute is set to TRUE, and 3004 - changes its state to Idle. 3006 If the HoldTimer_Expires (Event 10), the local system: 3007 - sends a NOTIFICATION message with error code Hold 3008 Timer Expired, 3009 - sets the ConnectRetryTimer to zero, 3010 - releases all BGP resources, 3011 - drops the TCP connection, 3012 - increments the ConnectRetryCounter, 3013 - (optionally) performs peer oscillation damping if the 3014 DampPeerOscillations attribute is set to TRUE, and 3015 - changes its state to Idle. 3017 If a TcpConnection_Valid (Event 14) or Tcp_CR_Acked (Event 16) 3018 is received, or a TcpConnectionConfirmed event (Event 17) is 3019 received, a second TCP connection may be in progress. This 3020 second TCP connection is tracked per Connection Collision 3021 processing (Section 6.8) until an OPEN message is received. 3023 A TCP Connection Request for an Invalid port 3024 (Tcp_CR_Invalid (Event 15)) is ignored. 3026 If a TcpConnectionFails event (Event 18) is received, 3027 the local system: 3028 - closes the BGP connection, 3029 - restarts the ConnectRetryTimer, 3030 - continues to listen for a connection that may be 3031 initiated by the remote BGP peer, and 3032 - changes its state to Active. 3034 When an OPEN message is received, all fields are checked 3035 for correctness. If there are no errors in the OPEN message 3036 (Event 19), the local system: 3037 - resets the DelayOpenTimer to zero, 3038 - sets the BGP ConnectRetryTimer to zero, 3039 - sends a KEEPALIVE message, and 3040 - sets a KeepaliveTimer (via the text below) 3041 - sets the HoldTimer according to the negotiated value 3042 (see Section 4.2), 3043 - changes its state to OpenConfirm. 3045 If the negotiated hold time value is zero, then the HoldTimer and 3046 KeepaliveTimer are not started. If the value of the Autonomous 3047 System field is the same as the local Autonomous System number, 3048 then the connection is an "internal" connection; otherwise, it 3049 is an "external" connection. (This will impact UPDATE processing 3051 RFC DRAFT October 2004 3053 as described below.) 3055 If the BGP message header checking (Event 21) or OPEN message 3056 check detects an error (Event 22)(see Section 6.2), the local system: 3057 - sends a NOTIFICATION message with appropriate error 3058 code, 3059 - sets the ConnectRetryTimer to zero, 3060 - releases all BGP resources, 3061 - drops the TCP connection, 3062 - increments the ConnectRetryCounter by 1, 3063 - (optionally) performs peer oscillation damping if the 3064 DampPeerOscillations attribute is TRUE, and 3065 - changes its state to Idle. 3067 Collision detection mechanisms (Section 6.8) need to be 3068 applied when a valid BGP OPEN message is received (Event 19 or 3069 Event 20). Please refer to Section 6.8 for the details of 3070 the comparison. A CollisionDetectDump event occurs when the 3071 BGP implementation determines, by a means outside the scope of 3072 this document, that a connection collision has occurred. 3074 If a connection in OpenSent state is determined to be the 3075 connection that must be closed, an OpenCollisionDump (Event 23) 3076 is signaled to the state machine. If such an event is 3077 received in OpenSent state, the local system: 3078 - sends a NOTIFICATION with a Cease 3079 - sets the ConnectRetryTimer to zero, 3080 - releases all BGP resources, 3081 - drops the TCP connection, 3082 - increments ConnectRetryCounter by 1, 3083 - (optionally) performs peer oscillation damping if the 3084 DampPeerOscillations attribute is set to TRUE, and 3085 - changes its state to Idle. 3087 If a NOTIFICATION message is received with a version 3088 error (Event 24), the local system: 3089 - sets the ConnectRetryTimer to zero, 3090 - releases all BGP resources, 3091 - drops the TCP connection, and 3092 - changes its state to Idle. 3094 In response to any other event (Events 9, 11-13,20,25-28), 3095 the local system: 3096 - sends the NOTIFICATION with the Error Code Finite 3097 state machine error, 3098 - sets the ConnectRetryTimer to zero, 3100 RFC DRAFT October 2004 3102 - releases all BGP resources, 3103 - drops the TCP connection, 3104 - increments the ConnectRetryCounter by 1, 3105 - (optionally) performs peer oscillation damping if the 3106 DampPeerOscillations attribute is set to TRUE, and 3107 - changes its state to Idle. 3109 OpenConfirm State: 3111 In this state BGP waits for a KEEPALIVE or NOTIFICATION 3112 message. 3114 Any start event (Events 1, 3-7) is ignored in the OpenConfirm 3115 state. 3117 In response to a ManualStop event (Event 2) initiated by 3118 the operator, the local system: 3119 - sends the NOTIFICATION message with Cease, 3120 - releases all BGP resources, 3121 - drops the TCP connection, 3122 - sets the ConnectRetryCounter to zero, 3123 - sets the ConnectRetryTimer to zero, and 3124 - changes its state to Idle. 3126 In response to the AutomaticStop event initiated by the 3127 system (Event 8), the local system: 3128 - sends the NOTIFICATION message with Cease, 3129 - sets the ConnectRetryTimer to zero, 3130 - releases all BGP resources, 3131 - drops the TCP connection, 3132 - increments the ConnectRetryCounter by 1, 3133 - (optionally) performs peer oscillation damping 3134 if the DampPeerOscillations attribute is set to TRUE, 3135 and 3136 - changes its state to Idle. 3138 If the HoldTimer_Expires event (Event 10) occurs before a KEEPALIVE 3139 message is received, the local system: 3140 - sends the NOTIFICATION message with the error code, 3141 - sets the ConnectRetryTimer to zero, 3142 - releases all BGP resources, 3143 - drops the TCP connection, 3144 - increments the ConnectRetryCounter by 1, 3145 - (optionally) performs peer oscillation damping if 3147 RFC DRAFT October 2004 3149 the DampPeerOscillations attribute is set to TRUE, and 3150 - changes its state to Idle. 3152 If the local system receives a KeepaliveTimer_Expires 3153 event (Event 11), the system: 3154 - sends a KEEPALIVE message, 3155 - restarts the KeepaliveTimer, and 3156 - remains in OpenConfirmed state. 3158 In the event of TcpConnection_Valid event (Event 14), or TCP 3159 connection succeeding (Event 16 or Event 17) while in OpenConfirm, 3160 the local system needs to track the second connection. 3162 If a TCP connection is attempted to an invalid port (Event 15), 3163 the local system will ignore the second connection 3164 attempt. 3166 If the local system receives a TcpConnectionFails event 3167 (Event 18) from the underlying TCP or a NOTIFICATION 3168 message (Event 25), the local system: 3169 - sets the ConnectRetryTimer to zero, 3170 - releases all BGP resources, 3171 - drops the TCP connection, 3172 - increments the ConnectRetryCounter by 1, 3173 - (optionally) performs peer oscillation damping if the 3174 DampPeerOscillations attribute is set to TRUE, and 3175 - changes its state to Idle. 3177 If the local system receives a NOTIFICATION message with a 3178 version error (NotifMsgVerErr (Event 24)), the local system: 3179 - sets the ConnectRetryTimer to zero, 3180 - releases all BGP resources, 3181 - drops the TCP connection, and 3182 - changes its state to Idle. 3184 If the local system receives a valid OPEN message 3185 (BGPOpen (Event 19)), the collision detect function is 3186 processed per Section 6.8. If this connection is to be 3187 dropped due to connection collision, the local system: 3188 - sends a NOTIFICATION with a Cease, 3189 - sets the ConnectRetryTimer to zero, 3190 - releases all BGP resources, 3191 - drops the TCP connection (send TCP FIN), 3192 - increments the ConnectRetryCounter by 1, 3193 - (optionally) performs peer oscillation damping if the 3195 RFC DRAFT October 2004 3197 DampPeerOscillations attribute is set to TRUE, and 3198 - changes its state to Idle. 3200 If an OPEN message is received, all fields are checked for 3201 correctness. If the BGP message header checking 3202 (BGPHeaderErr (Event 21)) or OPEN message check detects 3203 an error (see Section 6.2) (BGPOpenMsgErr (Event 22)), the 3204 local system: 3205 - sends a NOTIFICATION message with appropriate error 3206 code, 3207 - sets the ConnectRetryTimer to zero, 3208 - releases all BGP resources, 3209 - drops the TCP connection, 3210 - increments the ConnectRetryCounter by 1, 3211 - (optionally) performs peer oscillation damping if the 3212 DampPeerOscillations attribute is set to TRUE, and 3213 - changes its state to Idle. 3215 If during the processing of another OPEN message, the BGP 3216 implementation determines by a means outside the scope of 3217 this document that a connection collision has occurred and 3218 this connection is to be closed, the local system will 3219 issue an OpenCollisionDump event (Event 23). When the local 3220 system receives an OpenCollisionDump event (Event 23), the 3221 local system: 3222 - sends a NOTIFICATION with a Cease 3223 - sets the ConnectRetryTimer to zero, 3224 - releases all BGP resources 3225 - drops the TCP connection, 3226 - increments the ConnectRetryCounter by 1, 3227 - (optionally) performs peer oscillation damping if the 3228 DampPeerOscillations attribute is set to TRUE, and 3229 - changes its state to Idle. 3231 If the local system receives a KEEPALIVE message 3232 (KeepAliveMsg (Event 26)), the local system: 3233 - restarts the HoldTimer and 3234 - changes its state to Established. 3236 In response to any other event (Events 9, 12-13, 20, 27-28), 3237 the local system: 3238 - sends a NOTIFICATION with a code of Finite State 3239 Machine Error, 3240 - sets the ConnectRetryTimer to zero, 3241 - releases all BGP resources, 3243 RFC DRAFT October 2004 3245 - drops the TCP connection, 3246 - increments the ConnectRetryCounter by 1, 3247 - (optionally) performs peer oscillation damping if the 3248 DampPeerOscillations attribute is set to TRUE, and 3249 - changes its state to Idle. 3251 Established State: 3253 In the Established state, the BGP FSM can exchange UPDATE, 3254 NOTIFICATION, and KEEPALIVE messages with its peer. 3256 Any Start event (Events 1, 3-7) is ignored in the 3257 Established state. 3259 In response to a ManualStop event (initiated by an 3260 operator) (Event 2), the local system: 3261 - sends the NOTIFICATION message with Cease, 3262 - sets the ConnectRetryTimer to zero, 3263 - deletes all routes associated with this connection, 3264 - releases BGP resources, 3265 - drops the TCP connection, 3266 - sets ConnectRetryCounter to zero, and 3267 - changes its state to Idle. 3269 In response to an AutomaticStop event (Event 8), the local system: 3270 - sends a NOTIFICATION with Cease, 3271 - sets the ConnectRetryTimer to zero 3272 - deletes all routes associated with this connection, 3273 - releases all BGP resources, 3274 - drops the TCP connection, 3275 - increments the ConnectRetryCounter by 1, 3276 - (optionally) performs peer oscillation damping if the 3277 DampPeerOscillations attribute is set to TRUE, and 3278 - changes its state to Idle. 3280 One reason for an AutomaticStop event is: A BGP receives 3281 UPDATE messages with number of prefixes for a given 3282 peer so that the total prefixes received exceeds the 3283 maximum number of prefixes configured. The local system 3284 automatically disconnects the peer. 3286 If the HoldTimer_Expires event occurs (Event 10), the 3287 local system: 3288 - sends a NOTIFICATION message with Error Code Hold 3289 Timer Expired, 3291 RFC DRAFT October 2004 3293 - sets the ConnectRetryTimer to zero, 3294 - releases all BGP resources, 3295 - drops the TCP connection, 3296 - increments the ConnectRetryCounter by 1, 3297 - (optionally) performs peer oscillation damping if the 3298 DampPeerOscillations attribute is set to TRUE, and 3299 - changes its state to Idle. 3301 If the KeepaliveTimer_Expires event occurs (Event 11), 3302 the local system: 3303 - sends a KEEPALIVE message, and 3304 - restarts its KeepaliveTimer unless the negotiated 3305 HoldTime value is zero. 3307 Each time the local system sends a KEEPALIVE or UPDATE 3308 message, it restarts its KeepaliveTimer, unless the 3309 negotiated HoldTime value is zero. 3311 A TcpConnection_Valid (Event 14) received for a 3312 valid port will cause the second connection to be 3313 tracked. 3315 An invalid TCP connection (Tcp_CR_Invalid event 3316 (Event 15)), will be ignored. 3318 In response to an indication that the TCP connection 3319 is successfully established (Event 16 or Event 17), 3320 the second connection SHALL be tracked until 3321 it sends an OPEN message. 3323 If a valid OPEN message (BGPOpen (Event 19)) is received, 3324 and if the CollisionDetectEstablishedState optional 3325 attribute is TRUE, the OPEN message will be checked 3326 to see if it collides (Section 6.8) with any other connection. 3327 If the BGP implementation determines that this connection 3328 needs to be terminated, it will process an OpenCollisionDump 3329 event (Event 23). If this connection needs to be 3330 terminated, the local system: 3331 - sends a NOTIFICATION with a Cease, 3332 - sets the ConnectRetryTimer to zero, 3333 - deletes all routes associated with this connection, 3334 - releases all BGP resources, 3335 - drops the TCP connection, 3336 - increments ConnectRetryCounter by 1, 3337 - (optionally) performs peer oscillation damping if the 3338 DampPeerOscillations is set to TRUE, and 3339 - changes its state to Idle. 3341 RFC DRAFT October 2004 3343 If the local system receives a NOTIFICATION message 3344 (Event 24 or Event 25) or a TcpConnectionFails (Event 18) 3345 from the underlying TCP, it: 3346 - sets the ConnectRetryTimer to zero, 3347 - deletes all routes associated with this connection, 3348 - releases all the BGP resources, 3349 - drops the TCP connection, 3350 - increments the ConnectRetryCounter by 1, 3351 - changes its state to Idle. 3353 If the local system receives a KEEPALIVE message 3354 (Event 26), the local system: 3355 - restarts its HoldTimer, if the negotiated HoldTime 3356 value is non-zero, and 3357 - remains in the Established state. 3359 If the local system receives an UPDATE message (Event 27), 3360 the local system: 3361 - processes the message, 3362 - restarts its HoldTimer if the negotiated HoldTime 3363 value is non-zero, and 3364 - remains in the Established state. 3366 If the local system receives an UPDATE message, and the 3367 UPDATE message error handling procedure (see Section 6.3) 3368 detects an error (Event 28), the local system: 3369 - sends a NOTIFICATION message with Update error, 3370 - sets the ConnectRetryTimer to zero, 3371 - deletes all routes associated with this connection, 3372 - releases all BGP resources, 3373 - drops the TCP connection, 3374 - increments the ConnectRetryCounter by 1, 3375 - (optionally) performs peer oscillation damping if the 3376 DampPeerOscillations attribute is set to TRUE, and 3377 - changes its state to Idle. 3379 In response to any other event (Events 9, 12-13, 20-22) the 3380 local system: 3381 - sends a NOTIFICATION message with Error Code Finite 3382 State Machine Error, 3383 - deletes all routes associated with this connection, 3384 - sets the ConnectRetryTimer to zero, 3385 - releases all BGP resources, 3386 - drops the TCP connection, 3388 RFC DRAFT October 2004 3390 - increments the ConnectRetryCounter by 1, 3391 - (optionally) performs peer oscillation damping if the 3392 DampPeerOscillations attribute is set to TRUE, and 3393 - changes its state to Idle. 3395 9. UPDATE Message Handling 3397 An UPDATE message may be received only in the Established state. 3398 Receiving an UPDATE message in any other state is an error. When an 3399 UPDATE message is received, each field is checked for validity as 3400 specified in Section 6.3. 3402 If an optional non-transitive attribute is unrecognized, it is qui- 3403 etly ignored. If an optional transitive attribute is unrecognized, 3404 the Partial bit (the third high-order bit) in the attribute flags 3405 octet is set to 1, and the attribute is retained for propagation to 3406 other BGP speakers. 3408 If an optional attribute is recognized, and has a valid value, then, 3409 depending on the type of the optional attribute, it is processed 3410 locally, retained, and updated, if necessary, for possible propaga- 3411 tion to other BGP speakers. 3413 If the UPDATE message contains a non-empty WITHDRAWN ROUTES field, 3414 the previously advertised routes whose destinations (expressed as IP 3415 prefixes) contained in this field SHALL be removed from the Adj-RIB- 3416 In. This BGP speaker SHALL run its Decision Process since the previ- 3417 ously advertised route is no longer available for use. 3419 If the UPDATE message contains a feasible route, the Adj-RIB-In will 3420 be updated with this route as follows: if the NLRI of the new route 3421 is identical to the one of the route currently stored in the Adj-RIB- 3422 In, then the new route SHALL replace the older route in the Adj-RIB- 3423 In, thus implicitly withdrawing the older route from service. Other- 3424 wise, if the Adj-RIB-In has no route with NLRI identical to the new 3425 route, the new route SHALL be placed in the Adj-RIB-In. 3427 Once the BGP speaker updates the Adj-RIB-In, the speaker SHALL run 3428 its Decision Process. 3430 RFC DRAFT October 2004 3432 9.1 Decision Process 3434 The Decision Process selects routes for subsequent advertisement by 3435 applying the policies in the local Policy Information Base (PIB) to 3436 the routes stored in its Adj-RIBs-In. The output of the Decision 3437 Process is the set of routes that will be advertised to peers; the 3438 selected routes will be stored in the local speaker's Adj-RIBs-Out 3439 according to policy. 3441 The BGP Decision Process described here is conceptual, and does not 3442 have to be implemented precisely as described here, as long as the 3443 implementations support the described functionality and their exter- 3444 nally visible behavior is the same. 3446 The selection process is formalized by defining a function that takes 3447 the attribute of a given route as an argument and returns either (a) 3448 a non-negative integer denoting the degree of preference for the 3449 route, or (b) a value denoting that this route is ineligible to be 3450 installed in Loc-RIB and will be excluded from the next phase of 3451 route selection. 3453 The function that calculates the degree of preference for a given 3454 route SHALL NOT use as its inputs any of the following: the existence 3455 of other routes, the non-existence of other routes, or the path 3456 attributes of other routes. Route selection then consists of individ- 3457 ual application of the degree of preference function to each feasible 3458 route, followed by the choice of the one with the highest degree of 3459 preference. 3461 The Decision Process operates on routes contained in the Adj-RIBs-In, 3462 and is responsible for: 3464 - selection of routes to be used locally by the speaker 3466 - selection of routes to be advertised to other BGP peers 3468 - route aggregation and route information reduction 3470 The Decision Process takes place in three distinct phases, each trig- 3471 gered by a different event: 3473 a) Phase 1 is responsible for calculating the degree of preference 3474 for each route received from a peer. 3476 b) Phase 2 is invoked on completion of phase 1. It is responsible 3477 for choosing the best route out of all those available for each 3478 distinct destination, and for installing each chosen route into 3480 RFC DRAFT October 2004 3482 the Loc-RIB. 3484 c) Phase 3 is invoked after the Loc-RIB has been modified. It is 3485 responsible for disseminating routes in the Loc-RIB to each peer, 3486 according to the policies contained in the PIB. Route aggregation 3487 and information reduction can optionally be performed within this 3488 phase. 3490 9.1.1 Phase 1: Calculation of Degree of Preference 3492 The Phase 1 decision function is invoked whenever the local BGP 3493 speaker receives from a peer an UPDATE message that advertises a new 3494 route, a replacement route, or withdrawn routes. 3496 The Phase 1 decision function is a separate process which completes 3497 when it has no further work to do. 3499 The Phase 1 decision function locks an Adj-RIB-In prior to operating 3500 on any route contained within it, and unlocks it after operating on 3501 all new or unfeasible routes contained within it. 3503 For each newly received or replacement feasible route, the local BGP 3504 speaker determines a degree of preference as follows: 3506 If the route is learned from an internal peer, either the value of 3507 the LOCAL_PREF attribute is taken as the degree of preference, or 3508 the local system computes the degree of preference of the route 3509 based on preconfigured policy information. Note that the latter 3510 (computing the degree of preference based on preconfigured policy 3511 information) may result in formation of persistent routing loops. 3513 If the route is learned from an external peer, then the local BGP 3514 speaker computes the degree of preference based on preconfigured 3515 policy information. If the return value indicates that the route 3516 is ineligible, the route MAY NOT serve as an input to the next 3517 phase of route selection; otherwise the return value MUST be used 3518 as the LOCAL_PREF value in any IBGP readvertisement. 3520 The exact nature of this policy information and the computation 3521 involved is a local matter. 3523 9.1.2 Phase 2: Route Selection 3525 The Phase 2 decision function is invoked on completion of Phase 1. 3527 RFC DRAFT October 2004 3529 The Phase 2 function is a separate process which completes when it 3530 has no further work to do. The Phase 2 process considers all routes 3531 that are eligible in the Adj-RIBs-In. 3533 The Phase 2 decision function is blocked from running while the Phase 3534 3 decision function is in process. The Phase 2 function locks all 3535 Adj-RIBs-In prior to commencing its function, and unlocks them on 3536 completion. 3538 If the NEXT_HOP attribute of a BGP route depicts an address that is 3539 not resolvable, or it would become unresolvable if the route was 3540 installed in the routing table the BGP route MUST be excluded from 3541 the Phase 2 decision function. 3543 If the AS_PATH attribute of a BGP route contains an AS loop, the BGP 3544 route should be excluded from the Phase 2 decision function. AS loop 3545 detection is done by scanning the full AS path (as specified in the 3546 AS_PATH attribute), and checking that the autonomous system number of 3547 the local system does not appear in the AS path. Operations of a BGP 3548 speaker that is configured to accept routes with its own autonomous 3549 system number in the AS path are outside the scope of this document. 3551 It is critical that BGP speakers within an AS do not make conflicting 3552 decisions regarding route selection that would cause forwarding loops 3553 to occur. 3555 For each set of destinations for which a feasible route exists in the 3556 Adj-RIBs-In, the local BGP speaker identifies the route that has: 3558 a) the highest degree of preference of any route to the same set 3559 of destinations, or 3561 b) is the only route to that destination, or 3563 c) is selected as a result of the Phase 2 tie breaking rules spec- 3564 ified in 9.1.2.2. 3566 The local speaker SHALL then install that route in the Loc-RIB, 3567 replacing any route to the same destination that is currently being 3568 held in the Loc-RIB. When the new BGP route is installed in the Rout- 3569 ing Table, care must be taken to ensure that existing routes to the 3570 same destination that are now considered invalid are removed from the 3571 Routing Table. Whether or not the new BGP route replaces an existing 3572 non-BGP route in the Routing Table depends on the policy configured 3573 on the BGP speaker. 3575 The local speaker MUST determine the immediate next-hop address from 3576 the NEXT_HOP attribute of the selected route (see Section 5.1.3). If 3578 RFC DRAFT October 2004 3580 either the immediate next hop or the IGP cost to the NEXT_HOP (where 3581 the NEXT_HOP is resolved through an IGP route) changes, Phase 2 Route 3582 Selection MUST be performed again. 3584 Notice that even though BGP routes do not have to be installed in the 3585 Routing Table with the immediate next hop(s), implementations MUST 3586 take care that before any packets are forwarded along a BGP route, 3587 its associated NEXT_HOP address is resolved to the immediate 3588 (directly connected) next-hop address and this address (or multiple 3589 addresses) is finally used for actual packet forwarding. 3591 Unresolvable routes SHALL be removed from the Loc-RIB and the routing 3592 table. However, corresponding unresolvable routes SHOULD be kept in 3593 the Adj-RIBs-In (in case they become resolvable). 3595 9.1.2.1 Route Resolvability Condition 3597 As indicated in Section 9.1.2, BGP speakers SHOULD exclude unresolv- 3598 able routes from the Phase 2 decision. This ensures that only valid 3599 routes are installed in Loc-RIB and the Routing Table. 3601 The route resolvability condition is defined as follows. 3603 1. A route Rte1, referencing only the intermediate network 3604 address, is considered resolvable if the Routing Table contains at 3605 least one resolvable route Rte2 that matches Rte1's intermediate 3606 network address and is not recursively resolved (directly or indi- 3607 rectly) through Rte1. If multiple matching routes are available, 3608 only the longest matching route SHOULD be considered. 3610 2. Routes referencing interfaces (with or without intermediate 3611 addresses) are considered resolvable if the state of the refer- 3612 enced interface is up and IP processing is enabled on this inter- 3613 face. 3615 BGP routes do not refer to interfaces, but can be resolved through 3616 the routes in the Routing Table that can be of both types (those that 3617 specify interfaces or those that do not). IGP routes and routes to 3618 directly connected networks are expected to specify the outbound 3619 interface. Static routes can specify the outbound interface, or the 3620 intermediate address, or both. 3622 Note that a BGP route is considered unresolvable not only in situa- 3623 tions where the BGP speaker's Routing Table contains no route match- 3624 ing the BGP route's NEXT_HOP. Mutually recursive routes (routes 3625 resolving each other or themselves), also fail the resolvability 3627 RFC DRAFT October 2004 3629 check. 3631 It is also important that implementations do not consider feasible 3632 routes that would become unresolvable if they were installed in the 3633 Routing Table even if their NEXT_HOPs are resolvable using the cur- 3634 rent contents of the Routing Table (an example of such routes would 3635 be mutually recursive routes). This check ensures that a BGP speaker 3636 does not install in the Routing Table routes that will be removed and 3637 not used by the speaker. Therefore, in addition to local Routing Ta- 3638 ble stability, this check also improves behavior of the protocol in 3639 the network. 3641 Whenever a BGP speaker identifies a route that fails the resolvabil- 3642 ity check because of mutual recursion, an error message SHOULD be 3643 logged. 3645 9.1.2.2 Breaking Ties (Phase 2) 3647 In its Adj-RIBs-In a BGP speaker may have several routes to the same 3648 destination that have the same degree of preference. The local 3649 speaker can select only one of these routes for inclusion in the 3650 associated Loc-RIB. The local speaker considers all routes with the 3651 same degrees of preference, both those received from internal peers, 3652 and those received from external peers. 3654 The following tie-breaking procedure assumes that for each candidate 3655 route all the BGP speakers within an autonomous system can ascertain 3656 the cost of a path (interior distance) to the address depicted by the 3657 NEXT_HOP attribute of the route, and follow the same route selection 3658 algorithm. 3660 The tie-breaking algorithm begins by considering all equally prefer- 3661 able routes to the same destination, and then selects routes to be 3662 removed from consideration. The algorithm terminates as soon as only 3663 one route remains in consideration. The criteria MUST be applied in 3664 the order specified. 3666 Several of the criteria are described using pseudo-code. Note that 3667 the pseudo-code shown was chosen for clarity, not efficiency. It is 3668 not intended to specify any particular implementation. BGP implemen- 3669 tations MAY use any algorithm which produces the same results as 3670 those described here. 3672 a) Remove from consideration all routes which are not tied for 3673 having the smallest number of AS numbers present in their AS_PATH 3674 attributes. Note, that when counting this number, an AS_SET counts 3676 RFC DRAFT October 2004 3678 as 1, no matter how many ASs are in the set. 3680 b) Remove from consideration all routes which are not tied for 3681 having the lowest Origin number in their Origin attribute. 3683 c) Remove from consideration routes with less-preferred 3684 MULTI_EXIT_DISC attributes. MULTI_EXIT_DISC is only comparable 3685 between routes learned from the same neighboring AS (the neighbor- 3686 ing AS is determined from the AS_PATH attribute). Routes which do 3687 not have the MULTI_EXIT_DISC attribute are considered to have the 3688 lowest possible MULTI_EXIT_DISC value. 3690 This is also described in the following procedure: 3692 for m = all routes still under consideration 3693 for n = all routes still under consideration 3694 if (neighborAS(m) == neighborAS(n)) and (MED(n) < MED(m)) 3695 remove route m from consideration 3697 In the pseudo-code above, MED(n) is a function which returns the 3698 value of route n's MULTI_EXIT_DISC attribute. If route n has no 3699 MULTI_EXIT_DISC attribute, the function returns the lowest possi- 3700 ble MULTI_EXIT_DISC value, i.e. 0. 3702 Similarly, neighborAS(n) is a function which returns the neighbor 3703 AS from which the route was received. If the route is learned via 3704 IBGP, and the other IBGP speaker didn't originate the route, it is 3705 the neighbor AS from which the other IBGP speaker learned the 3706 route. If the route is learned via IBGP, and the other IBGP 3707 speaker either (a) originated the route, or (b) created the route 3708 by aggregation and the AS_PATH attribute of the aggregate route is 3709 either empty or begins with an AS_SET, it is the local AS. 3711 If a MULTI_EXIT_DISC attribute is removed before re-advertising a 3712 route into IBGP, then comparison based on the received EBGP 3713 MULTI_EXIT_DISC attribute MAY still be performed. If an implemen- 3714 tation chooses to remove MULTI_EXIT_DISC, then the optional com- 3715 parison on MULTI_EXIT_DISC if performed at all MUST be performed 3716 only among EBGP learned routes. The best EBGP learned route may 3717 then be compared with IBGP learned routes after the removal of the 3718 MULTI_EXIT_DISC attribute. If MULTI_EXIT_DISC is removed from a 3719 subset of EBGP learned routes and the selected "best" EBGP learned 3720 route will not have MULTI_EXIT_DISC removed, then the 3721 MULTI_EXIT_DISC must be used in the comparison with IBGP learned 3722 routes. For IBGP learned routes the MULTI_EXIT_DISC MUST be used 3723 in route comparisons which reach this step in the Decision 3724 Process. Including the MULTI_EXIT_DISC of an EBGP learned route 3725 in the comparison with an IBGP learned route, then removing the 3727 RFC DRAFT October 2004 3729 MULTI_EXIT_DISC attribute and advertising the route has been 3730 proven to cause route loops. 3732 d) If at least one of the candidate routes was received via EBGP, 3733 remove from consideration all routes which were received via IBGP. 3735 e) Remove from consideration any routes with less-preferred inte- 3736 rior cost. The interior cost of a route is determined by calcu- 3737 lating the metric to the NEXT_HOP for the route using the Routing 3738 Table. If the NEXT_HOP hop for a route is reachable, but no cost 3739 can be determined, then this step should be skipped (equivalently, 3740 consider all routes to have equal costs). 3742 This is also described in the following procedure. 3744 for m = all routes still under consideration 3745 for n = all routes in still under consideration 3746 if (cost(n) is lower than cost(m)) 3747 remove m from consideration 3749 In the pseudo-code above, cost(n) is a function which returns the 3750 cost of the path (interior distance) to the address given in the 3751 NEXT_HOP attribute of the route. 3753 f) Remove from consideration all routes other than the route that 3754 was advertised by the BGP speaker whose BGP Identifier has the 3755 lowest value. 3757 g) Prefer the route received from the lowest peer address. 3759 9.1.3 Phase 3: Route Dissemination 3761 The Phase 3 decision function is invoked on completion of Phase 2, or 3762 when any of the following events occur: 3764 a) when routes in the Loc-RIB to local destinations have changed 3766 b) when locally generated routes learned by means outside of BGP 3767 have changed 3769 c) when a new BGP speaker - BGP speaker connection has been estab- 3770 lished 3772 The Phase 3 function is a separate process which completes when it 3773 has no further work to do. The Phase 3 Routing Decision function is 3774 blocked from running while the Phase 2 decision function is in 3776 RFC DRAFT October 2004 3778 process. 3780 All routes in the Loc-RIB are processed into Adj-RIBs-Out according 3781 to configured policy. This policy MAY exclude a route in the Loc-RIB 3782 from being installed in a particular Adj-RIB-Out. A route SHALL NOT 3783 be installed in the Adj-Rib-Out unless the destination and NEXT_HOP 3784 described by this route may be forwarded appropriately by the Routing 3785 Table. If a route in Loc-RIB is excluded from a particular Adj-RIB- 3786 Out the previously advertised route in that Adj-RIB-Out MUST be with- 3787 drawn from service by means of an UPDATE message (see 9.2). 3789 Route aggregation and information reduction techniques (see 9.2.2.1) 3790 may optionally be applied. 3792 Any local policy which results in routes being added to an Adj-RIB- 3793 Out without also being added to the local BGP speaker's forwarding 3794 table, is outside the scope of this document. 3796 When the updating of the Adj-RIBs-Out and the Routing Table is com- 3797 plete, the local BGP speaker runs the Update-Send process of 9.2. 3799 9.1.4 Overlapping Routes 3801 A BGP speaker may transmit routes with overlapping Network Layer 3802 Reachability Information (NLRI) to another BGP speaker. NLRI overlap 3803 occurs when a set of destinations are identified in non-matching mul- 3804 tiple routes. Since BGP encodes NLRI using IP prefixes, overlap will 3805 always exhibit subset relationships. A route describing a smaller 3806 set of destinations (a longer prefix) is said to be more specific 3807 than a route describing a larger set of destinations (a shorter pre- 3808 fix); similarly, a route describing a larger set of destinations is 3809 said to be less specific than a route describing a smaller set of 3810 destinations. 3812 The precedence relationship effectively decomposes less specific 3813 routes into two parts: 3815 - a set of destinations described only by the less specific route, 3816 and 3818 - a set of destinations described by the overlap of the less spe- 3819 cific and the more specific routes 3821 The set of destinations described by the overlap represents a portion 3822 of the less specific route that is feasible, but is not currently in 3824 RFC DRAFT October 2004 3826 use. If a more specific route is later withdrawn, the set of desti- 3827 nations described by the overlap will still be reachable using the 3828 less specific route. 3830 If a BGP speaker receives overlapping routes, the Decision Process 3831 MUST consider both routes based on the configured acceptance policy. 3832 If both a less and a more specific route are accepted, then the Deci- 3833 sion Process MUST install in Loc-RIB either both the less and the 3834 more specific routes or aggregate the two routes and install in Loc- 3835 RIB the aggregated route, provided that both routes have the same 3836 value of the NEXT_HOP attribute. 3838 If a BGP speaker chooses to aggregate, then it SHOULD either include 3839 all AS used to form the aggregate in an AS_SET or add the 3840 ATOMIC_AGGREGATE attribute to the route. This attribute is now pri- 3841 marily informational. With the elimination of IP routing protocols 3842 that do not support classless routing and the elimination of router 3843 and host implementations that do not support classless routing, there 3844 is no longer a need to de-aggregate. Routes SHOULD NOT be de-aggre- 3845 gated. A route that carries ATOMIC_AGGREGATE attribute in particular 3846 MUST NOT be de-aggregated. That is, the NLRI of this route can not be 3847 made more specific. Forwarding along such a route does not guarantee 3848 that IP packets will actually traverse only ASs listed in the AS_PATH 3849 attribute of the route. 3851 9.2 Update-Send Process 3853 The Update-Send process is responsible for advertising UPDATE mes- 3854 sages to all peers. For example, it distributes the routes chosen by 3855 the Decision Process to other BGP speakers which may be located in 3856 either the same autonomous system or a neighboring autonomous system. 3858 When a BGP speaker receives an UPDATE message from an internal peer, 3859 the receiving BGP speaker SHALL NOT re-distribute the routing infor- 3860 mation contained in that UPDATE message to other internal peers 3861 (unless the speaker acts as a BGP Route Reflector [RFC2796]). 3863 As part of Phase 3 of the route selection process, the BGP speaker 3864 has updated its Adj-RIBs-Out. All newly installed routes and all 3865 newly unfeasible routes for which there is no replacement route SHALL 3866 be advertised to its peers by means of an UPDATE message. 3868 A BGP speaker SHOULD NOT advertise a given feasible BGP route from 3869 its Adj-RIB-Out if it would produce an UPDATE message containing the 3870 same BGP route as was previously advertised. 3872 RFC DRAFT October 2004 3874 Any routes in the Loc-RIB marked as unfeasible SHALL be removed. 3875 Changes to the reachable destinations within its own autonomous sys- 3876 tem SHALL also be advertised in an UPDATE message. 3878 If due to the limits on the maximum size of an UPDATE message (see 3879 Section 4) a single route doesn't fit into the message, the BGP 3880 speaker MUST not advertise the route to its peers and MAY choose to 3881 log an error locally. 3883 9.2.1 Controlling Routing Traffic Overhead 3885 The BGP protocol constrains the amount of routing traffic (that is, 3886 UPDATE messages) in order to limit both the link bandwidth needed to 3887 advertise UPDATE messages and the processing power needed by the 3888 Decision Process to digest the information contained in the UPDATE 3889 messages. 3891 9.2.1.1 Frequency of Route Advertisement 3893 The parameter MinRouteAdvertisementIntervalTimer determines the mini- 3894 mum amount of time that must elapse between advertisement and/or 3895 withdrawal of routes to a particular destination by a BGP speaker to 3896 a peer. This rate limiting procedure applies on a per-destination 3897 basis, although the value of MinRouteAdvertisementIntervalTimer is 3898 set on a per BGP peer basis. 3900 Two UPDATE messages sent by a BGP speaker to a peer that advertise 3901 feasible routes and/or withdrawal of unfeasible routes to some common 3902 set of destinations MUST be separated by at least MinRouteAdvertise- 3903 mentIntervalTimer. Clearly, this can only be achieved precisely by 3904 keeping a separate timer for each common set of destinations. This 3905 would be unwarranted overhead. Any technique which ensures that the 3906 interval between two UPDATE messages sent from a BGP speaker to a 3907 peer that advertise feasible routes and/or withdrawal of unfeasible 3908 routes to some common set of destinations will be at least Min- 3909 RouteAdvertisementIntervalTimer, and will also ensure a constant 3910 upper bound on the interval is acceptable. 3912 Since fast convergence is needed within an autonomous system, either 3913 (a) the MinRouteAdvertisementIntervalTimer used for internal peers 3914 SHOULD be shorter than the MinRouteAdvertisementIntervalTimer used 3915 for external peers, or (b) the procedure describe in this section 3917 RFC DRAFT October 2004 3919 SHOULD NOT apply for routes sent to internal peers. 3921 This procedure does not limit the rate of route selection, but only 3922 the rate of route advertisement. If new routes are selected multiple 3923 times while awaiting the expiration of MinRouteAdvertisementInterval- 3924 Timer, the last route selected SHALL be advertised at the end of Min- 3925 RouteAdvertisementIntervalTimer. 3927 9.2.1.2 Frequency of Route Origination 3929 The parameter MinASOriginationIntervalTimer determines the minimum 3930 amount of time that must elapse between successive advertisements of 3931 UPDATE messages that report changes within the advertising BGP 3932 speaker's own autonomous systems. 3934 9.2.2 Efficient Organization of Routing Information 3936 Having selected the routing information which it will advertise, a 3937 BGP speaker may avail itself of several methods to organize this 3938 information in an efficient manner. 3940 9.2.2.1 Information Reduction 3942 Information reduction may imply a reduction in granularity of policy 3943 control - after information is collapsed, the same policies will 3944 apply to all destinations and paths in the equivalence class. 3946 The Decision Process may optionally reduce the amount of information 3947 that it will place in the Adj-RIBs-Out by any of the following meth- 3948 ods: 3950 a) Network Layer Reachability Information (NLRI): 3952 Destination IP addresses can be represented as IP address pre- 3953 fixes. In cases where there is a correspondence between the 3954 address structure and the systems under control of an autonomous 3955 system administrator, it will be possible to reduce the size of 3956 the NLRI carried in the UPDATE messages. 3958 b) AS_PATHs: 3960 AS path information can be represented as ordered AS_SEQUENCEs or 3962 RFC DRAFT October 2004 3964 unordered AS_SETs. AS_SETs are used in the route aggregation algo- 3965 rithm described in 9.2.2.2. They reduce the size of the AS_PATH 3966 information by listing each AS number only once, regardless of how 3967 many times it may have appeared in multiple AS_PATHs that were 3968 aggregated. 3970 An AS_SET implies that the destinations listed in the NLRI can be 3971 reached through paths that traverse at least some of the con- 3972 stituent autonomous systems. AS_SETs provide sufficient informa- 3973 tion to avoid routing information looping; however their use may 3974 prune potentially feasible paths, since such paths are no longer 3975 listed individually as in the form of AS_SEQUENCEs. In practice 3976 this is not likely to be a problem, since once an IP packet 3977 arrives at the edge of a group of autonomous systems, the BGP 3978 speaker at that point is likely to have more detailed path infor- 3979 mation and can distinguish individual paths to destinations. 3981 9.2.2.2 Aggregating Routing Information 3983 Aggregation is the process of combining the characteristics of sev- 3984 eral different routes in such a way that a single route can be adver- 3985 tised. Aggregation can occur as part of the Decision Process to 3986 reduce the amount of routing information that will be placed in the 3987 Adj-RIBs-Out. 3989 Aggregation reduces the amount of information that a BGP speaker must 3990 store and exchange with other BGP speakers. Routes can be aggregated 3991 by applying the following procedure separately to path attributes of 3992 the same type and to the Network Layer Reachability Information. 3994 Routes that have different MULTI_EXIT_DISC attribute SHALL NOT be 3995 aggregated. 3997 If the aggregated route has an AS_SET as the first element in its 3998 AS_PATH attribute, then the router that originates the route SHOULD 3999 NOT advertise the MULTI_EXIT_DISC attribute with this route. 4001 Path attributes that have different type codes can not be aggregated 4002 together. Path attributes of the same type code may be aggregated, 4003 according to the following rules: 4005 NEXT_HOP: 4006 When aggregating routes that have different NEXT_HOP attribute, 4007 the NEXT_HOP attribute of the aggregated route SHALL identify 4008 an interface on the BGP speaker that performs the aggregation. 4010 RFC DRAFT October 2004 4012 ORIGIN attribute: 4013 If at least one route among routes that are aggregated has ORI- 4014 GIN with the value INCOMPLETE, then the aggregated route MUST 4015 have the ORIGIN attribute with the value INCOMPLETE. Other- 4016 wise, if at least one route among routes that are aggregated 4017 has ORIGIN with the value EGP, then the aggregated route MUST 4018 have the ORIGIN attribute with the value EGP. In all other 4019 cases the value of the ORIGIN attribute of the aggregated route 4020 is IGP. 4022 AS_PATH attribute: 4023 If routes to be aggregated have identical AS_PATH attributes, 4024 then the aggregated route has the same AS_PATH attribute as 4025 each individual route. 4027 For the purpose of aggregating AS_PATH attributes we model each 4028 AS within the AS_PATH attribute as a tuple , where 4029 "type" identifies a type of the path segment the AS belongs to 4030 (e.g. AS_SEQUENCE, AS_SET), and "value" is the AS number. If 4031 the routes to be aggregated have different AS_PATH attributes, 4032 then the aggregated AS_PATH attribute SHALL satisfy all of the 4033 following conditions: 4035 - all tuples of type AS_SEQUENCE in the aggregated AS_PATH 4036 SHALL appear in all of the AS_PATH in the initial set of 4037 routes to be aggregated. 4039 - all tuples of type AS_SET in the aggregated AS_PATH SHALL 4040 appear in at least one of the AS_PATH in the initial set 4041 (they may appear as either AS_SET or AS_SEQUENCE types). 4043 - for any tuple X of type AS_SEQUENCE in the aggregated 4044 AS_PATH which precedes tuple Y in the aggregated AS_PATH, X 4045 precedes Y in each AS_PATH in the initial set which contains 4046 Y, regardless of the type of Y. 4048 - No tuple of type AS_SET with the same value SHALL appear 4049 more than once in the aggregated AS_PATH. 4051 - Multiple tuples of type AS_SEQUENCE with the same value 4052 may appear in the aggregated AS_PATH only when adjacent to 4053 another tuple of the same type and value. 4055 An implementation may choose any algorithm which conforms to 4056 these rules. At a minimum a conformant implementation SHALL be 4057 able to perform the following algorithm that meets all of the 4058 above conditions: 4060 RFC DRAFT October 2004 4062 - determine the longest leading sequence of tuples (as 4063 defined above) common to all the AS_PATH attributes of the 4064 routes to be aggregated. Make this sequence the leading 4065 sequence of the aggregated AS_PATH attribute. 4067 - set the type of the rest of the tuples from the AS_PATH 4068 attributes of the routes to be aggregated to AS_SET, and 4069 append them to the aggregated AS_PATH attribute. 4071 - if the aggregated AS_PATH has more than one tuple with the 4072 same value (regardless of tuple's type), eliminate all, but 4073 one such tuple by deleting tuples of the type AS_SET from 4074 the aggregated AS_PATH attribute. 4076 - for each pair of adjacent tuples in the aggregated 4077 AS_PATH, if both tuples have the same type, merge them 4078 together, as long as doing so will not cause a segment with 4079 length greater than 255 to be generated. 4081 Appendix F, Section F.6 presents another algorithm that satis- 4082 fies the conditions and allows for more complex policy configu- 4083 rations. 4085 ATOMIC_AGGREGATE: 4086 If at least one of the routes to be aggregated has 4087 ATOMIC_AGGREGATE path attribute, then the aggregated route 4088 SHALL have this attribute as well. 4090 AGGREGATOR: 4091 Any AGGREGATOR attributes from the routes to be aggregated MUST 4092 NOT be included in the aggregated route. The BGP speaker per- 4093 forming the route aggregation MAY attach a new AGGREGATOR 4094 attribute (see Section 5.1.7). 4096 9.3 Route Selection Criteria 4098 Generally speaking, additional rules for comparing routes among sev- 4099 eral alternatives are outside the scope of this document. There are 4100 two exceptions: 4102 - If the local AS appears in the AS path of the new route being 4103 considered, then that new route can not be viewed as better than 4104 any other route (provided that the speaker is configured to accept 4105 such routes). If such a route were ever used, a routing loop could 4106 result. 4108 RFC DRAFT October 2004 4110 - In order to achieve successful distributed operation, only 4111 routes with a likelihood of stability can be chosen. Thus, an AS 4112 SHOULD avoid using unstable routes, and it SHOULD NOT make rapid 4113 spontaneous changes to its choice of route. Quantifying the terms 4114 "unstable" and "rapid" in the previous sentence will require expe- 4115 rience, but the principle is clear. Routes that are unstable can 4116 be "penalized" (e.g., by using the procedures described in 4117 [RFC2439]). 4119 9.4 Originating BGP routes 4121 A BGP speaker may originate BGP routes by injecting routing informa- 4122 tion acquired by some other means (e.g. via an IGP) into BGP. A BGP 4123 speaker that originates BGP routes assigns the degree of preference 4124 (e.g., according to local configuration) to these routes by passing 4125 them through the Decision Process (see Section 9.1). These routes MAY 4126 also be distributed to other BGP speakers within the local AS as part 4127 of the update process (see Section 9.2). The decision whether to dis- 4128 tribute non-BGP acquired routes within an AS via BGP or not depends 4129 on the environment within the AS (e.g. type of IGP) and SHOULD be 4130 controlled via configuration. 4132 10 BGP Timers 4134 BGP employs five timers: ConnectRetryTimer (see Section 8), HoldTimer 4135 (see Section 4.2), KeepaliveTimer (see Section 8), MinASOrigination- 4136 IntervalTimer (see Section 9.2.1.2), and MinRouteAdvertisementInter- 4137 valTimer (see Section 9.2.1.1). 4139 Two optional timers MAY be supported: DelayOpenTimer, IdleHoldTimer 4140 by BGP (see section 8). Section 8 describes their use. The full oper- 4141 ation of these optional timers is outside the scope of this document. 4143 ConnectRetryTime is a mandatory FSM attribute that stores the initial 4144 value for the ConnectRetryTimer. The suggested default value for the 4145 ConnectRetryTime is 120 seconds. 4147 HoldTime is a mandatory FSM attribute that stores the initial value 4148 for the HoldTimer. The suggested default value for the HoldTime is 90 4149 seconds. 4151 During some portions of the state machine (see Section 8), the Hold- 4152 Timer is set to a large value. The suggested default for this large 4153 value is 4 minutes. 4155 RFC DRAFT October 2004 4157 The KeepaliveTime is a mandatory FSM attribute that stores the ini- 4158 tial value for the KeepaliveTimer. The suggested default value for 4159 the KeepaliveTime is 1/3 of the HoldTime. 4161 The suggested default value for the MinASOriginationIntervalTimer is 4162 15 seconds. 4164 The suggested default value for the MinRouteAdvertisementInterval- 4165 Timer on EBGP connections is 30 seconds. 4167 The suggested default value for the MinRouteAdvertisementInterval- 4168 Timer on IBGP connections is 5 seconds. 4170 An implementation of BGP MUST allow the HoldTimer to be configurable 4171 on a per peer basis, and MAY allow the other timers to be config- 4172 urable. 4174 To minimize the likelihood that the distribution of BGP messages by a 4175 given BGP speaker will contain peaks, jitter SHOULD be applied to the 4176 timers associated with MinASOriginationIntervalTimer, KeepaliveTimer, 4177 MinRouteAdvertisementIntervalTimer, and ConnectRetryTimer. A given 4178 BGP speaker MAY apply the same jitter to each of these quantities 4179 regardless of the destinations to which the updates are being sent; 4180 that is, jitter need not be configured on a "per peer" basis. 4182 The suggested default amount of jitter SHALL be determined by multi- 4183 plying the base value of the appropriate timer by a random factor 4184 which is uniformly distributed in the range from 0.75 to 1.0. A new 4185 random value SHOULD be picked each time the timer is set. The range 4186 of the jitter random value MAY be configurable. 4188 Appendix A. Comparison with RFC1771 4190 There are numerous editorial changes (too many to list here). 4192 The following list the technical changes: 4194 Changes to reflect the usages of such features as TCP MD5 4195 [RFC2385], BGP Route Reflectors [RFC2796], BGP Confederations 4196 [RFC3065], and BGP Route Refresh [RFC2918]. 4198 Clarification on the use of the BGP Identifier in the AGGREGATOR 4199 attribute. 4201 Procedures for imposing an upper bound on the number of prefixes 4202 that a BGP speaker would accept from a peer. 4204 RFC DRAFT October 2004 4206 The ability of a BGP speaker to include more than one instance of 4207 its own AS in the AS_PATH attribute for the purpose of inter-AS 4208 traffic engineering. 4210 Clarifications on the various types of NEXT_HOPs. 4212 Clarifications to the use of the ATOMIC_AGGREGATE attribute. 4214 The relationship between the immediate next hop, and the next hop 4215 as specified in the NEXT_HOP path attribute. 4217 Clarifications on the tie-breaking procedures. 4219 Clarifications on the frequency of route advertisements. 4221 Optional Parameter Type 1 (Authentication Information) has been 4222 deprecated. 4224 UPDATE Message Error subcode 7 (AS Routing Loop) has been depre- 4225 cated. 4227 OPEN Message Error subcode 5 (Authentication Failure) has been 4228 deprecated. 4230 Use of the Marker field for authentication has been deprecated. 4232 Implementations MUST support TCP MD5 [RFC2385] for authentication. 4234 Clarification of BGP FSM. 4236 Appendix B. Comparison with RFC1267 4238 All the changes listed in Appendix A, plus the following. 4240 BGP-4 is capable of operating in an environment where a set of reach- 4241 able destinations may be expressed via a single IP prefix. The con- 4242 cept of network classes, or subnetting is foreign to BGP-4. To 4243 accommodate these capabilities BGP-4 changes semantics and encoding 4244 associated with the AS_PATH attribute. New text has been added to 4245 define semantics associated with IP prefixes. These abilities allow 4246 BGP-4 to support the proposed supernetting scheme [9]. 4248 To simplify configuration this version introduces a new attribute, 4249 LOCAL_PREF, that facilitates route selection procedures. 4251 The INTER_AS_METRIC attribute has been renamed to be MULTI_EXIT_DISC. 4253 RFC DRAFT October 2004 4255 A new attribute, ATOMIC_AGGREGATE, has been introduced to insure that 4256 certain aggregates are not de-aggregated. Another new attribute, 4257 AGGREGATOR, can be added to aggregate routes in order to advertise 4258 which AS and which BGP speaker within that AS caused the aggregation. 4260 To insure that Hold Timers are symmetric, the Hold Timer is now nego- 4261 tiated on a per-connection basis. Hold Timers of zero are now sup- 4262 ported. 4264 Appendix C. Comparison with RFC 1163 4266 All of the changes listed in Appendices A and B, plus the following. 4268 To detect and recover from BGP connection collision, a new field (BGP 4269 Identifier) has been added to the OPEN message. New text (Section 4270 6.8) has been added to specify the procedure for detecting and recov- 4271 ering from collision. 4273 The new document no longer restricts the router that is passed in the 4274 NEXT_HOP path attribute to be part of the same Autonomous System as 4275 the BGP Speaker. 4277 New document optimizes and simplifies the exchange of the information 4278 about previously reachable routes. 4280 Appendix D. Comparison with RFC 1105 4282 All of the changes listed in Appendices A, B and C, plus the follow- 4283 ing. 4285 Minor changes to the RFC1105 Finite State Machine were necessary to 4286 accommodate the TCP user interface provided by 4.3 BSD. 4288 The notion of Up/Down/Horizontal relations present in RFC1105 has 4289 been removed from the protocol. 4291 The changes in the message format from RFC1105 are as follows: 4293 1. The Hold Time field has been removed from the BGP header and 4294 added to the OPEN message. 4296 2. The version field has been removed from the BGP header and 4297 added to the OPEN message. 4299 3. The Link Type field has been removed from the OPEN message. 4301 RFC DRAFT October 2004 4303 4. The OPEN CONFIRM message has been eliminated and replaced with 4304 implicit confirmation provided by the KEEPALIVE message. 4306 5. The format of the UPDATE message has been changed signifi- 4307 cantly. New fields were added to the UPDATE message to support 4308 multiple path attributes. 4310 6. The Marker field has been expanded and its role broadened to 4311 support authentication. 4313 Note that quite often BGP, as specified in RFC 1105, is referred 4314 to as BGP-1, BGP, as specified in RFC 1163, is referred to as 4315 BGP-2, BGP, as specified in RFC1267 is referred to as BGP-3, and 4316 BGP, as specified in this document is referred to as BGP-4. 4318 Appendix E. TCP options that may be used with BGP 4320 If a local system TCP user interface supports TCP PUSH function, then 4321 each BGP message SHOULD be transmitted with PUSH flag set. Setting 4322 PUSH flag forces BGP messages to be transmitted promptly to the 4323 receiver. 4325 If a local system TCP user interface supports setting of the DSCP 4326 field [RFC2474] for TCP connections, then the TCP connection used by 4327 BGP SHOULD be opened with bits 0-2 of the DSCP field set to 110 4328 (binary). 4330 An implementation MUST support TCP MD5 option [RFC2385]. 4332 Appendix F. Implementation Recommendations 4334 This section presents some implementation recommendations. 4336 Appendix F.1 Multiple Networks Per Message 4338 The BGP protocol allows for multiple address prefixes with the same 4339 path attributes to be specified in one message. Making use of this 4340 capability is highly recommended. With one address prefix per message 4341 there is a substantial increase in overhead in the receiver. Not only 4342 does the system overhead increase due to the reception of multiple 4343 messages, but the overhead of scanning the routing table for updates 4344 to BGP peers and other routing protocols (and sending the associated 4345 messages) is incurred multiple times as well. 4347 RFC DRAFT October 2004 4349 One method of building messages containing many address prefixes per 4350 a path attribute set from a routing table that is not organized on a 4351 per path attribute set basis is to build many messages as the routing 4352 table is scanned. As each address prefix is processed, a message for 4353 the associated set of path attributes is allocated, if it does not 4354 exist, and the new address prefix is added to it. If such a message 4355 exists, the new address prefix is just appended to it. If the message 4356 lacks the space to hold the new address prefix, it is transmitted, a 4357 new message is allocated, and the new address prefix is inserted into 4358 the new message. When the entire routing table has been scanned, all 4359 allocated messages are sent and their resources released. Maximum 4360 compression is achieved when all the destinations covered by the 4361 address prefixes share a common set of path attributes making it pos- 4362 sible to send many address prefixes in one 4096-byte message. 4364 When peering with a BGP implementation that does not compress multi- 4365 ple address prefixes into one message, it may be necessary to take 4366 steps to reduce the overhead from the flood of data received when a 4367 peer is acquired or a significant network topology change occurs. One 4368 method of doing this is to limit the rate of updates. This will 4369 eliminate the redundant scanning of the routing table to provide 4370 flash updates for BGP peers and other routing protocols. A disadvan- 4371 tage of this approach is that it increases the propagation latency of 4372 routing information. By choosing a minimum flash update interval 4373 that is not much greater than the time it takes to process the multi- 4374 ple messages this latency should be minimized. A better method would 4375 be to read all received messages before sending updates. 4377 Appendix F.2 Reducing route flapping 4379 To avoid excessive route flapping a BGP speaker which needs to with- 4380 draw a destination and send an update about a more specific or less 4381 specific route should combine them into the same UPDATE message. 4383 Appendix F.3 Path attribute ordering 4385 Implementations which combine update messages as described above in 4386 6.1 may prefer to see all path attributes presented in a known order. 4387 This permits them to quickly identify sets of attributes from differ- 4388 ent update messages which are semantically identical. To facilitate 4389 this, it is a useful optimization to order the path attributes 4390 according to type code. This optimization is entirely optional. 4392 RFC DRAFT October 2004 4394 Appendix F.4 AS_SET sorting 4396 Another useful optimization that can be done to simplify this situa- 4397 tion is to sort the AS numbers found in an AS_SET. This optimization 4398 is entirely optional. 4400 Appendix F.5 Control over version negotiation 4402 Since BGP-4 is capable of carrying aggregated routes which can not be 4403 properly represented in BGP-3, an implementation which supports BGP-4 4404 and another BGP version should provide the capability to only speak 4405 BGP-4 on a per-peer basis. 4407 Appendix F.6 Complex AS_PATH aggregation 4409 An implementation which chooses to provide a path aggregation algo- 4410 rithm which retains significant amounts of path information may wish 4411 to use the following procedure: 4413 For the purpose of aggregating AS_PATH attributes of two routes, 4414 we model each AS as a tuple , where "type" identifies 4415 a type of the path segment the AS belongs to (e.g. AS_SEQUENCE, 4416 AS_SET), and "value" is the AS number. Two ASs are said to be the 4417 same if their corresponding tuples are the same. 4419 The algorithm to aggregate two AS_PATH attributes works as fol- 4420 lows: 4422 a) Identify the same ASs (as defined above) within each AS_PATH 4423 attribute that are in the same relative order within both 4424 AS_PATH attributes. Two ASs, X and Y, are said to be in the 4425 same order if either: 4426 - X precedes Y in both AS_PATH attributes, or - Y precedes X 4427 in both AS_PATH attributes. 4429 b) The aggregated AS_PATH attribute consists of ASs identified 4430 in (a) in exactly the same order as they appear in the AS_PATH 4431 attributes to be aggregated. If two consecutive ASs identified 4432 in (a) do not immediately follow each other in both of the 4433 AS_PATH attributes to be aggregated, then the intervening ASs 4434 (ASs that are between the two consecutive ASs that are the 4435 same) in both attributes are combined into an AS_SET path seg- 4436 ment that consists of the intervening ASs from both AS_PATH 4438 RFC DRAFT October 2004 4440 attributes; this segment is then placed in between the two con- 4441 secutive ASs identified in (a) of the aggregated attribute. If 4442 two consecutive ASs identified in (a) immediately follow each 4443 other in one attribute, but do not follow in another, then the 4444 intervening ASs of the latter are combined into an AS_SET path 4445 segment; this segment is then placed in between the two consec- 4446 utive ASs identified in (a) of the aggregated attribute. 4448 c) For each pair of adjacent tuples in the aggregated AS_PATH, 4449 if both tuples have the same type, merge them together, as long 4450 as doing so will not cause a segment with length greater than 4451 255 to be generated. 4453 If as a result of the above procedure a given AS number appears 4454 more than once within the aggregated AS_PATH attribute, all, but 4455 the last instance (rightmost occurrence) of that AS number should 4456 be removed from the aggregated AS_PATH attribute. 4458 Security Considerations 4460 A BGP implementation MUST support the authentication mechanism speci- 4461 fied in RFC 2385 [RFC2385]. The authentication provided by this mech- 4462 anism could be done on a per peer basis. 4464 BGP makes use of TCP for reliable transport of its traffic between 4465 peer routers. To provide connection-oriented integrity and data ori- 4466 gin authentication, on a point-to-point basis, BGP specifies use of 4467 the mechanism defined in RFC 2385. These services are intended to 4468 detect and reject active wiretapping attacks against the inter-router 4469 TCP connections. Absent use of mechanisms that effect these security 4470 services, attackers can disrupt these TCP connections and/or masquer- 4471 ade as a legitimate peer router. Because the mechanism defined in the 4472 RFC does not provide peer-entity authentication, these connections 4473 may be subject to some forms of replay attacks that will not be 4474 detected at the TCP layer. Such attacks might result in delivery 4475 (from TCP) of "broken" or "spoofed" BGP messages. 4477 The mechanism defined in RFC 2385 augments the normal TCP checksum 4478 with a 16-byte message authentication code (MAC) that is computed 4479 over the same data as the TCP checksum. This MAC is based on a one- 4480 way hash function (MD5) and use of a secret key. The key is shared 4481 between peer routers and is used to generate MAC values that are not 4482 readily computed by an attacker who does not have access to the key. 4483 A compliant implementation must support this mechanism, and must 4484 allow a network administrator to activate it on a per-peer basis. 4486 RFC DRAFT October 2004 4488 RFC 2385 does not specify a means of managing (e.g., generating, dis- 4489 tributing, and replacing) the keys used to compute the MAC. RFC 3562 4490 [RFC3562] (an informational document) provides some guidance in this 4491 area, and provides rationale to support this guidance. It notes that 4492 a distinct key should be used for communication with each protected 4493 peer. If the same key is used for multiple peers, the offered secu- 4494 rity services may be degraded, e.g., due to increased risk of compro- 4495 mise at one router adversely affecting other routers. 4497 The keys used for MAC computation should be changed periodically, to 4498 minimize the impact of a key compromise or successful cryptanalytic 4499 attack. RFC 3562 suggests a crypto period (the interval during which 4500 a key is employed) of at most 90 days. More frequent key changes 4501 reduce the likelihood that replay attacks (as described above) will 4502 be feasible. However, absent a standard mechanism for effecting such 4503 changes in a coordinated fashion between peers, one cannot assume 4504 that BGP-4 implementations complying with this RFC will support fre- 4505 quent key changes. 4507 Obviously, each key also should be chosen so as to be hard for an 4508 attacker to guess. The techniques specified in RFC 1750 for random 4509 number generation provide a guide for generation of values that could 4510 be used as keys. RFC 2385 calls for implementations to support keys 4511 "composed of a string of printable ASCII of 80 bytes or less." RFC 4512 3562 suggests keys used in this context be 12 to 24 bytes of random 4513 (pseudo-random) bits. This is fairly consistent with suggestions for 4514 analogous MAC algorithms, which typically employ keys in the range of 4515 16-20 bytes. RFC 3562 also observes that, to provide enough random 4516 bits at the low end of this range, a typical ACSII text string would 4517 have to be close to the upper bound for key length specified in RFC 4518 2385. 4520 BGP vulnerabilities analysis is discussed in [BGP_VULN]. 4522 IANA Considerations 4524 All the BGP messages contain an 8-bit message type, for which IANA is 4525 to create and maintain a registry entitled "BGP Message Types". This 4526 document defines the following message types: 4528 Name Value Definition 4529 ---- ----- ---------- 4530 OPEN 1 See Section 4.2 4531 UPDATE 2 See Section 4.3 4532 KEEPALIVE 3 See Section 4.4 4533 NOTIFICATION 4 See Section 4.5 4535 RFC DRAFT October 2004 4537 Future assignment are to be made using either the Standards Action 4538 process defined in [RFC2434], or the Early IANA Allocation process 4539 defined in [kompella-zinin]. Assignments consist of a name and the 4540 value. 4542 The BGP UPDATE messages may carry one or more Path Attributes, where 4543 each Attribute contains an 8-bit Attribute Type Code. IANA is already 4544 maintaining such a registry, entitled "BGP Path Attributes". [note to 4545 IANA, the registry already exists at http://www.iana.org/assign- 4546 ments/bgp-parameters, but should be renamed per this document. XXX to 4547 be removed upon RFC publication.] This document defines the following 4548 Path Attributes Type Codes: 4550 Name Value Definition 4551 ---- ----- ---------- 4552 ORIGIN 1 See Section 5.1.1 4553 AS_PATH 2 See Section 5.1.2 4554 NEXT_HOP 3 See Section 5.1.3 4555 MULTI_EXIT_DISC 4 See Section 5.1.4 4556 LOCAL_PREF 5 See Section 5.1.5 4557 ATOMIC_AGGREGATE 6 See Section 5.1.6 4558 AGGREGATOR 7 See Section 5.1.7 4560 Future assignment are to be made using either the Standards Action 4561 process defined in [RFC2434], or the Early IANA Allocation process 4562 defined in [kompella-zinin]. Assignments consist of a name and the 4563 value. 4565 The BGP NOTIFICATION message carries an 8-bit Error Code, for which 4566 IANA is to create and maintain a registry entitled "BGP Error Codes". 4567 This document defines the following Error Codes: 4569 Name Value Definition 4570 ------------ ----- ---------- 4571 Message Header Error 1 Section 6.1 4572 OPEN Message Error 2 Section 6.2 4573 UPDATE Message Error 3 Section 6.3 4574 Hold Timer Expired 4 Section 6.5 4575 Finite State Machine Error 5 Section 6.6 4576 Cease 6 Section 6.7 4578 Future assignment are to be made using either the Standards Action process 4579 defined in [RFC2434], or the Early IANA Allocation process defined 4580 in [kompella-zinin]. Assignments consist of a name and the value. 4582 The BGP NOTIFICATION message carries an 8-bit Error Subcode, where 4583 each Subcode has to be defined within the context of a particular 4584 Error Code, and thus has to be unique only within that context. 4586 RFC DRAFT October 2004 4588 IANA is to create and maintain a set of registries, "Error Subcodes", 4589 with a separate registry for each BGP Error Code. Future assignment are 4590 to be made using either the Standards Action process defined in [RFC2434], 4591 or the Early IANA Allocation process defined in [kompella-zinin]. 4592 Assignments consist of a name and the value. 4594 This document defines the following Message Header Error subcodes: 4596 Name Value Definition 4597 -------------------- ----- ---------- 4598 Connection Not Synchronized 1 See Section 6.1 4599 Bad Message Length 2 See Section 6.1 4600 Bad Message Type 3 See Section 6.1 4602 This document defines the following OPEN Message Error subcodes: 4604 Name Value Definition 4605 -------------------- ----- ---------- 4606 Unsupported Version Number 1 See Section 6.2 4607 Bad Peer AS 2 See Section 6.2 4608 Bad BGP Identifier 3 See Section 6.2 4609 Unsupported Optional Parameter 4 See Section 6.2 4610 [Deprecated] 5 See Appendix A 4611 Unacceptable Hold Time 6 See Section 6.2 4613 This document defines the following UPDATE Message Error subcodes: 4615 Name Value Definition 4616 -------------------- --- ---------- 4617 Malformed Attribute List 1 See Section 6.3 4618 Unrecognized Well-known Attribute 2 See Section 6.3 4619 Missing Well-known Attribute 3 See Section 6.3 4620 Attribute Flags Error 4 See Section 6.3 4621 Attribute Length Error 5 See Section 6.3 4622 Invalid ORIGIN Attribute 6 See Section 6.3 4623 [Deprecated] 7 See Appendix A 4624 Invalid NEXT_HOP Attribute 8 See Section 6.3 4625 Optional Attribute Error 9 See Section 6.3 4626 Invalid Network Field 10 See Section 6.3 4627 Malformed AS_PATH 11 See Section 6.3 4629 IPR Disclosure Acknowledgement 4631 By submitting this Internet-Draft, I certify that any applicable 4632 patent or other IPR claims of which I am aware have been disclosed, 4633 and any of which I become aware will be disclosed, in accordance with 4635 RFC DRAFT October 2004 4637 RFC 3668. 4639 Copyright Notice 4641 Copyright (C) The Internet Society (year). This document is subject 4642 to the rights, licenses and restrictions contained in BCP 78, and 4643 except as set forth therein, the authors retain all their rights. 4645 Additional copyright notices are not permitted in IETF Documents 4646 except in the case where such document is the product of a joint 4647 development effort between the IETF and another standards development 4648 organization or the document is a republication of the work of 4649 another standards organization. Such exceptions must be approved on 4650 an individual basis by the IAB. 4652 Disclaimer 4654 This document and the information contained herein are provided on an 4655 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 4656 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 4657 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 4658 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFOR- 4659 MATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES 4660 OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 4662 Normative References 4664 [RFC791] Postel, J., "Internet Protocol - DARPA Internet Program Pro- 4665 tocol Specification", RFC791, September 1981. 4667 [RFC793] Postel, J., "Transmission Control Protocol - DARPA Internet 4668 Program Protocol Specification", RFC793, September 1981. 4670 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 4671 Requirement Levels", BCP 14, RFC 2119, March 1997. 4673 [RFC2385] Heffernan, A., "Protection of BGP Sessions via the TCP MD5 4674 Signature Option", RFC2385, August 1998. 4676 [RFC2434] Narten, T., Alvestrand, H., "Guidelines for Writing an IANA 4677 Considerations Section in RFCs", RFC2434, October 1998 4679 [RFC2474] Nichols, K., et al.,"Definition of the Differentiated 4681 RFC DRAFT October 2004 4683 Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC2474, 4684 December 1998 4686 Non-normative References 4688 [RFC904] Mills, D., "Exterior Gateway Protocol Formal Specification", 4689 RFC904, April 1984. 4691 [RFC1092] Rekhter, Y., "EGP and Policy Based Routing in the New 4692 NSFNET Backbone", RFC1092, February 1989. 4694 [RFC1093] Braun, H-W., "The NSFNET Routing Architecture", RFC1093, 4695 February 1989. 4697 [RFC1772] Rekhter, Y., and P. Gross, "Application of the Border Gate- 4698 way Protocol in the Internet", RFC1772, March 1995. 4700 [RFC1518] Rekhter, Y., Li, T., "An Architecture for IP Address Allo- 4701 cation with CIDR", RFC 1518, September 1993. 4703 [RFC1519] Fuller, V., Li, T., Yu, J., and Varadhan, K., ""Classless 4704 Inter-Domain Routing (CIDR): an Address Assignment and Aggregation 4705 Strategy", RFC1519, September 1993. 4707 [RFC1930] Hawkinson, J., Bates, T.,"Guidelines for creation, selec- 4708 tion, and registration of an Autonomous System (AS)", RFC1930, March 4709 1996. 4711 [RFC1997] R. Chandra, P. Traina, T. Li, "BGP Communities Attribute", 4712 RFC 1997, August 1996. 4714 [RFC2439] C. Villamizar, R. Chandra, R. Govindan, "BGP Route Flap 4715 Damping", RFC2439, November 1998. 4717 [RFC2796] Bates, T., Chandra, R., Chen, E., "BGP Route Reflection - 4718 An Alternative to Full Mesh IBGP", RFC2796, April 2000. 4720 [RFC3392] R. Chandra, J. Scudder, "Capabilities Advertisement with 4721 BGP-4", RFC2842. 4723 [RFC2858] T. Bates, R. Chandra, D. Katz, Y. Rekhter, "Multiprotocol 4724 Extensions for BGP-4", RFC2858. 4726 [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC2918, 4727 September 2000. 4729 RFC DRAFT October 2004 4731 [RFC3065] Traina, P, McPherson, D., Scudder, J., "Autonomous System 4732 Confederations for BGP", RFC3065, February 2001. 4734 [RFC3562] Leech, M., "Key Management Considerations for the TCP MD5 4735 Signature Option", RFC3562, July 2003. 4737 3563 Cooperative Agreement Between the ISOC/IETF and ISO/IEC Joint 4739 [IS10747] "Information Processing Systems - Telecommunications and 4740 Information Exchange between Systems - Protocol for Exchange of 4741 Inter-domain Routeing Information among Intermediate Systems to Sup- 4742 port Forwarding of ISO 8473 PDUs", ISO/IEC IS10747, 1993 4744 [BGP_VULN] Murphy, S., "BGP Security Vulnerabilities Analysis", 4745 draft-ietf-idr-bgp-vuln-00.txt, work in progress 4747 [kompella-zinin] Kompella, K., Zinin, A., "Early IANA Allocation of 4748 Standards Track Codepoints", Work in progress 4750 Editors' Addresses 4752 Yakov Rekhter 4753 Juniper Networks 4754 email: yakov@juniper.net 4756 Tony Li 4757 email: tony.li@tony.li 4759 Susan Hares 4760 NextHop Technologies, Inc. 4761 email: skh@nexthop.com