idnits 2.17.1 draft-ietf-idr-bgp4-19.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 85 longer pages, the longest (page 2) being 61 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 58 instances of too long lines in the document, the longest one being 9 characters in excess of 72. ** There is 1 instance of lines with control characters in the document. ** The abstract seems to contain references ([RFC1518,RFC1519]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There is 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 356 has weird spacing: '...setting any B...' == Line 1841 has weird spacing: '... system autom...' == Line 1938 has weird spacing: '...on port numbe...' == Line 2109 has weird spacing: '...empt to to...' == Line 2685 has weird spacing: '...rom the under...' == (1 more instance...) -- The exact meaning of the all-uppercase expression 'MAY NOT' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: Well-known attributes MUST be recognized by all BGP implementations. Some of these attributes are mandatory and MUST be included in every UPDATE message that contains NLRI. Others are discretionary and MAY or MAY NOT be sent in a particular UPDATE message. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: Optional attributes: Passive TCP establishment flag SHOULD not be set. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: Optional attributes: 1) Perform automatic start flag SHOULD be set. if this event occurs. 2) if the passive Passive TCP establishment flag is supported, it SHOULD not be set if this event occurs. 3) if bgp peer oscillation damping is supported, the BGP stop_peer_flap flag should not be set when this event occurs. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: Optional attributes: 1) Perform Automatic start flag SHOULD be set 2) Passive TCP establishment flag SHOULD be set 3) If the bgp peer oscillation flag is supported, the stop_peer_flap flag SHOULD not be set. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: Optional attributes: 1) Perform automatic start flag SHOULD be set 2) stop_peer_flap flag SHOULD be set 3) Passive TCP establishment flag SHOULD not be set (cleared). == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: optional attributes: 1) Delay Open flag SHOULD not be set 2) Open Delay timer SHOULD not be running == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: If the route is learned from an external peer, then the local BGP speaker computes the degree of preference based on preconfigured policy information. If the return value indicates that the route is ineligible, the route MAY NOT serve as an input to the next RFC DRAFT March 2003 == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: If due to the limits on the maximum size of an UPDATE message (see Section 4) a single route doesn't fit into the message, the BGP speaker MUST not advertise the route to its peers and MAY choose to log an error locally. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RF3065' is mentioned on line 1335, but not defined == Missing Reference: 'Event6' is mentioned on line 2220, but not defined == Missing Reference: 'Event7' is mentioned on line 2221, but not defined == Missing Reference: 'Event 13' is mentioned on line 2222, but not defined == Missing Reference: 'Events 9-12' is mentioned on line 2227, but not defined == Missing Reference: '15-28' is mentioned on line 2227, but not defined == Missing Reference: 'Event 1' is mentioned on line 2235, but not defined == Missing Reference: '3-7' is mentioned on line 2629, but not defined == Missing Reference: 'Event2' is mentioned on line 2769, but not defined == Missing Reference: 'Event12' is mentioned on line 2396, but not defined == Missing Reference: 'Event 14' is mentioned on line 2816, but not defined == Missing Reference: 'Event 15' is mentioned on line 2820, but not defined == Missing Reference: 'Event18' is mentioned on line 2847, but not defined == Missing Reference: 'Event 20' is mentioned on line 2446, but not defined == Missing Reference: 'Event 21' is mentioned on line 2464, but not defined == Missing Reference: 'Event 22' is mentioned on line 2465, but not defined == Missing Reference: 'Event24' is mentioned on line 2607, but not defined == Missing Reference: 'Event1' is mentioned on line 2629, but not defined == Missing Reference: 'Event9' is mentioned on line 2385, but not defined == Missing Reference: 'Event 18' is mentioned on line 2685, but not defined == Missing Reference: 'Events 8' is mentioned on line 2494, but not defined == Missing Reference: '10-11' is mentioned on line 2494, but not defined -- Looks like a reference, but probably isn't: '13' on line 2494 -- Looks like a reference, but probably isn't: '19' on line 2494 -- Looks like a reference, but probably isn't: '23' on line 2494 == Missing Reference: '25-28' is mentioned on line 2613, but not defined == Missing Reference: 'Event 2' is mentioned on line 2634, but not defined == Missing Reference: 'Event 8' is mentioned on line 2644, but not defined == Missing Reference: 'Event 10' is mentioned on line 2655, but not defined == Missing Reference: 'Event 16' is mentioned on line 2538, but not defined == Missing Reference: 'Event 17' is mentioned on line 2541, but not defined == Missing Reference: 'Event 19' is mentioned on line 2829, but not defined == Missing Reference: 'Event21' is mentioned on line 2715, but not defined == Missing Reference: 'Event22' is mentioned on line 2717, but not defined == Missing Reference: 'Event 23' is mentioned on line 2833, but not defined == Missing Reference: 'Events 9' is mentioned on line 2885, but not defined == Missing Reference: '11-13' is mentioned on line 2613, but not defined -- Looks like a reference, but probably isn't: '20' on line 2613 == Missing Reference: 'Event 11' is mentioned on line 2669, but not defined == Missing Reference: 'Event 25' is mentioned on line 2686, but not defined == Missing Reference: 'Event 24' is mentioned on line 2696, but not defined == Missing Reference: 'Event 26' is mentioned on line 2858, but not defined == Missing Reference: '12-13' is mentioned on line 2885, but not defined == Missing Reference: '27-28' is mentioned on line 2749, but not defined == Missing Reference: 'Event8' is mentioned on line 2782, but not defined == Missing Reference: 'Event10' is mentioned on line 2797, but not defined == Missing Reference: 'Event11' is mentioned on line 2808, but not defined == Missing Reference: 'Event27' is mentioned on line 2863, but not defined == Missing Reference: 'Event28' is mentioned on line 2874, but not defined == Missing Reference: '20-22' is mentioned on line 2885, but not defined -- Looks like a reference, but probably isn't: '9' on line 3712 == Unused Reference: 'RFC1772' is defined on line 3968, but no explicit reference was found in the text == Unused Reference: 'RFC1997' is defined on line 3978, but no explicit reference was found in the text == Unused Reference: 'RFC2439' is defined on line 3981, but no explicit reference was found in the text == Unused Reference: 'RFC2858' is defined on line 3990, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2385 (Obsoleted by RFC 5925) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) -- Obsolete informational reference (is this intentional?): RFC 1519 (Obsoleted by RFC 4632) -- Obsolete informational reference (is this intentional?): RFC 2796 (Obsoleted by RFC 4456) -- Obsolete informational reference (is this intentional?): RFC 2842 (Obsoleted by RFC 3392) -- Obsolete informational reference (is this intentional?): RFC 2858 (Obsoleted by RFC 4760) -- Obsolete informational reference (is this intentional?): RFC 3065 (Obsoleted by RFC 5065) == Outdated reference: A later version (-01) exists of draft-ietf-idr-bgp-vuln-00 Summary: 8 errors (**), 0 flaws (~~), 68 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Y. Rekhter 2 INTERNET DRAFT Juniper Networks 3 T. Li 4 Procket Networks, Inc. 5 S. Hares 6 NextHop Technologies, Inc. 7 Editors 9 A Border Gateway Protocol 4 (BGP-4) 10 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of RFC2026. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as ``work in progress.'' 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Specification of Requirements 35 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 36 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 37 document are to be interpreted as described in RFC2119 [RFC2119]. 39 RFC DRAFT March 2003 41 TTaabbllee ooff CCoonntteennttss 43 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 44 1. Definition of commonly used terms . . . . . . . . . . . . . . 4 45 2. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 6 46 3. Summary of Operation . . . . . . . . . . . . . . . . . . . . . 7 47 3.1 Routes: Advertisement and Storage . . . . . . . . . . . . . . 9 48 3.2 Routing Information Bases . . . . . . . . . . . . . . . . . . 10 49 4. Message Formats . . . . . . . . . . . . . . . . . . . . . . . 11 50 4.1 Message Header Format . . . . . . . . . . . . . . . . . . . . 11 51 4.2 OPEN Message Format . . . . . . . . . . . . . . . . . . . . . 12 52 4.3 UPDATE Message Format . . . . . . . . . . . . . . . . . . . . 14 53 4.4 KEEPALIVE Message Format . . . . . . . . . . . . . . . . . . 21 54 4.5 NOTIFICATION Message Format . . . . . . . . . . . . . . . . . 21 55 5. Path Attributes . . . . . . . . . . . . . . . . . . . . . . . 23 56 5.1 Path Attribute Usage . . . . . . . . . . . . . . . . . . . . 25 57 5.1.1 ORIGIN . . . . . . . . . . . . . . . . . . . . . . . . . . 25 58 5.1.2 AS_PATH . . . . . . . . . . . . . . . . . . . . . . . . . . 25 59 5.1.3 NEXT_HOP . . . . . . . . . . . . . . . . . . . . . . . . . 26 60 5.1.4 MULTI_EXIT_DISC . . . . . . . . . . . . . . . . . . . . . . 28 61 5.1.5 LOCAL_PREF . . . . . . . . . . . . . . . . . . . . . . . . 28 62 5.1.6 ATOMIC_AGGREGATE . . . . . . . . . . . . . . . . . . . . . 29 63 5.1.7 AGGREGATOR . . . . . . . . . . . . . . . . . . . . . . . . 30 64 6. BGP Error Handling . . . . . . . . . . . . . . . . . . . . . . 30 65 6.1 Message Header error handling . . . . . . . . . . . . . . . . 30 66 6.2 OPEN message error handling . . . . . . . . . . . . . . . . . 31 67 6.3 UPDATE message error handling . . . . . . . . . . . . . . . . 32 68 6.4 NOTIFICATION message error handling . . . . . . . . . . . . . 34 69 6.5 Hold Timer Expired error handling . . . . . . . . . . . . . . 34 70 6.6 Finite State Machine error handling . . . . . . . . . . . . . 34 71 6.7 Cease . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 72 6.8 BGP connection collision detection . . . . . . . . . . . . . 35 73 7. BGP Version Negotiation . . . . . . . . . . . . . . . . . . . 36 74 8. BGP Finite State machine . . . . . . . . . . . . . . . . . . . 36 75 8.1 Events for the BGP FSM . . . . . . . . . . . . . . . . . . . 37 76 8.1.1 Administrative Events . . . . . . . . . . . . . . . . . . 37 77 8.1.2 Timer Events . . . . . . . . . . . . . . . . . . . . . . . 38 78 8.1.3 TCP connection based Events . . . . . . . . . . . . . . . . 39 79 8.1.4 BGP Messages based Events . . . . . . . . . . . . . . . . . 41 80 8.2 Description of FSM . . . . . . . . . . . . . . . . . . . . . 43 81 8.2.1 FSM Definition . . . . . . . . . . . . . . . . . . . . . . 43 82 8.2.1.1 Terms "active" and "passive" . . . . . . . . . . . . . . 43 83 8.2.1.2 FSM and collision detection . . . . . . . . . . . . . . . 44 84 8.2.2 Finite State Machine . . . . . . . . . . . . . . . . . . . 44 85 9. UPDATE Message Handling . . . . . . . . . . . . . . . . . . . 57 86 9.1 Decision Process . . . . . . . . . . . . . . . . . . . . . . 58 87 9.1.1 Phase 1: Calculation of Degree of Preference . . . . . . . 59 88 RFC DRAFT March 2003 90 9.1.2 Phase 2: Route Selection . . . . . . . . . . . . . . . . . 60 91 9.1.2.1 Route Resolvability Condition . . . . . . . . . . . . . . 61 92 9.1.2.2 Breaking Ties (Phase 2) . . . . . . . . . . . . . . . . . 62 93 9.1.3 Phase 3: Route Dissemination . . . . . . . . . . . . . . . 64 94 9.1.4 Overlapping Routes . . . . . . . . . . . . . . . . . . . . 65 95 9.2 Update-Send Process . . . . . . . . . . . . . . . . . . . . . 66 96 9.2.1 Controlling Routing Traffic Overhead . . . . . . . . . . . 67 97 9.2.1.1 Frequency of Route Advertisement . . . . . . . . . . . . 67 98 9.2.1.2 Frequency of Route Origination . . . . . . . . . . . . . 68 99 9.2.2 Efficient Organization of Routing Information . . . . . . . 68 100 9.2.2.1 Information Reduction . . . . . . . . . . . . . . . . . . 68 101 9.2.2.2 Aggregating Routing Information . . . . . . . . . . . . . 69 102 9.3 Route Selection Criteria . . . . . . . . . . . . . . . . . . 72 103 9.4 Originating BGP routes . . . . . . . . . . . . . . . . . . . 72 104 10. BGP Timers . . . . . . . . . . . . . . . . . . . . . . . . . 72 105 Appendix A. Comparison with RFC1771 . . . . . . . . . . . . . . . 73 106 Appendix B. Comparison with RFC1267 . . . . . . . . . . . . . . . 74 107 Appendix C. Comparison with RFC 1163 . . . . . . . . . . . . . . 75 108 Appendix D. Comparison with RFC 1105 . . . . . . . . . . . . . . 75 109 Appendix E. TCP options that may be used with BGP . . . . . . . . 76 110 Appendix F. Implementation Recommendations . . . . . . . . . . . 76 111 Appendix F.1 Multiple Networks Per Message . . . . . . . . . . . 76 112 Appendix F.2 Reducing route flapping . . . . . . . . . . . . . . 77 113 Appendix F.3 Path attribute ordering . . . . . . . . . . . . . . 77 114 Appendix F.4 AS_SET sorting . . . . . . . . . . . . . . . . . . . 77 115 Appendix F.5 Control over version negotiation . . . . . . . . . . 78 116 Appendix F.6 Complex AS_PATH aggregation . . . . . . . . . . . . 78 117 Security Considerations . . . . . . . . . . . . . . . . . . . . . 79 118 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . . 79 119 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 120 Authors Information . . . . . . . . . . . . . . . . . . . . . . . 80 121 RFC DRAFT March 2003 123 Abstract 125 The Border Gateway Protocol (BGP) is an inter-Autonomous System rout- 126 ing protocol. 128 The primary function of a BGP speaking system is to exchange network 129 reachability information with other BGP systems. This network reacha- 130 bility information includes information on the list of Autonomous 131 Systems (ASs) that reachability information traverses. This informa- 132 tion is sufficient to construct a graph of AS connectivity from which 133 routing loops may be pruned and some policy decisions at the AS level 134 may be enforced. 136 BGP-4 provides a set of mechanisms for supporting Classless Inter- 137 Domain Routing (CIDR) [RFC1518, RFC1519]. These mechanisms include 138 support for advertising a set of destinations as an IP prefix and 139 eliminating the concept of network "class" within BGP. BGP-4 also 140 introduces mechanisms which allow aggregation of routes, including 141 aggregation of AS paths. 143 Routing information exchanged via BGP supports only the destination- 144 based forwarding paradigm, which assumes that a router forwards a 145 packet based solely on the destination address carried in the IP 146 header of the packet. This, in turn, reflects the set of policy deci- 147 sions that can (and can not) be enforced using BGP. BGP can support 148 only the policies conforming to the destination-based forwarding 149 paradigm. 151 1. Definition of commonly used terms 153 This section provides definition for terms that have a specific mean- 154 ing to the BGP protocol and that are used throughout the text. 156 Autonomous System (AS) 157 The classic definition of an Autonomous System is a set of routers 158 under a single technical administration, using an interior gateway 159 protocol (IGP) and common metrics to determine how to route pack- 160 ets within the AS, and using an inter-AS routing protocol to 161 determine how to route packets to other ASs. Since this classic 162 definition was developed, it has become common for a single AS to 163 use several IGPs and sometimes several sets of metrics within an 164 AS. The use of the term Autonomous System here stresses the fact 165 that, even when multiple IGPs and metrics are used, the adminis- 166 tration of an AS appears to other ASs to have a single coherent 167 interior routing plan and presents a consistent picture of what 168 destinations are reachable through it. 170 RFC DRAFT March 2003 172 BGP speaker 173 A router that implements BGP. 175 BGP Identifier 176 A 4-octet unsigned integer indicating the BGP Identifier of the 177 sender of BGP messages. A given BGP speaker sets the value of its 178 BGP Identifier to an IP address assigned to that BGP speaker. The 179 value of the BGP Identifier is determined on startup and is the 180 same for every local interface and every BGP peer. 182 Internal peer 183 Peer that is in the same Autonomous System as the local system. 185 IBGP 186 Internal BGP (BGP connection between internal peers). 188 External peer 189 Peer that is in a different Autonomous System than the local sys- 190 tem. 192 EBGP 193 External BGP (BGP connection between external peers). 195 NLRI 196 Network Layer Reachability Information. 198 Route 199 A unit of information that pairs a set of destinations with the 200 attributes of a path to those destinations. The set of destina- 201 tions are systems whose IP addresses are contained in one IP 202 address prefix carried in the Network Layer Reachability Informa- 203 tion (NLRI) field of an UPDATE message. The path is the informa- 204 tion reported in the path attributes field of the same UPDATE mes- 205 sage. 207 RIB 208 Routing Information Base. 210 Adj-RIB-In 211 The Adj-RIBs-In contain unprocessed routing information that has 212 been advertised to the local BGP speaker by its peers. 214 Loc-RIB 215 The Loc-RIB contains the routes that have been selected by the 216 local BGP speaker's Decision Process. 218 Adj-RIB-Out 219 The Adj-RIBs-Out contains the routes for advertisement to specific 220 RFC DRAFT March 2003 222 peers by means of the local speaker's UPDATE messages. 224 IGP 225 Interior Gateway Protocol - a routing protocol used to exchange 226 routing information among routers within a single Autonomous Sys- 227 tem. 229 Feasible route 230 A route that is available for use. 232 Unfeasible route 233 A previously advertised feasible route that is no longer available 234 for use. 236 2. Acknowledgments 238 This document was originally published as RFC 1267 in October 1991, 239 jointly authored by Kirk Lougheed and Yakov Rekhter. 241 We would like to express our thanks to Guy Almes, Len Bosack, and 242 Jeffrey C. Honig for their contributions to the earlier version 243 (BGP-1) of this document. 245 We would like to specially acknowledge numerous contributions by Den- 246 nis Ferguson to the earlier version of this document. 248 We like to explicitly thank Bob Braden for the review of the earlier 249 version (BGP-2) of this document as well as his constructive and 250 valuable comments. 252 We would also like to thank Bob Hinden, Director for Routing of the 253 Internet Engineering Steering Group, and the team of reviewers he 254 assembled to review the earlier version (BGP-2) of this document. 255 This team, consisting of Deborah Estrin, Milo Medin, John Moy, Radia 256 Perlman, Martha Steenstrup, Mike St. Johns, and Paul Tsuchiya, acted 257 with a strong combination of toughness, professionalism, and cour- 258 tesy. 260 Certain sections of the document borrowed heavily from IDRP 261 [IS10747], which is the OSI counterpart of BGP. For this credit 262 should be given to the ANSI X3S3.3 group chaired by Lyman Chapin and 263 to Charles Kunzinger who was the IDRP editor within that group. 265 We would also like to thank Benjamin Abarbanel, Enke Chen, Edward 266 Crabbe, Mike Craren, Vincent Gillet, Eric Gray, Jeffrey Haas, Dimitry 267 Haskin, John Krawczyk, David LeRoy, Dan Massey, Jonathan Natale, Dan 268 Pei, Mathew Richardson, John Scudder, John Stewart III, Dave Thaler, 269 RFC DRAFT March 2003 271 Paul Traina, Russ White, Curtis Villamizar, and Alex Zinin for their 272 comments. 274 We would like to specially acknowledge Andrew Lange for his help in 275 preparing the final version of this document. 277 Finally, we would like to thank all the members of the IDR Working 278 Group for their ideas and support they have given to this document. 280 3. Summary of Operation 282 The Border Gateway Protocol (BGP) is an inter-Autonomous System rout- 283 ing protocol. It is built on experience gained with EGP as defined in 284 [RFC904] and EGP usage in the NSFNET Backbone as described in 285 [RFC1092] and [RFC1093]. 287 The primary function of a BGP speaking system is to exchange network 288 reachability information with other BGP systems. This network reacha- 289 bility information includes information on the list of Autonomous 290 Systems (ASs) that reachability information traverses. This informa- 291 tion is sufficient to construct a graph of AS connectivity from which 292 routing loops may be pruned and some policy decisions at the AS level 293 may be enforced. 295 In the context of this document we assume that a BGP speaker adver- 296 tises to its peers only those routes that it itself uses (in this 297 context a BGP speaker is said to "use" a BGP route if it is the most 298 preferred BGP route and is used in forwarding). All other cases are 299 outside the scope of this document. 301 In the context of this document the term "IP address" refers to an IP 302 Version 4 address [RFC791]. 304 Routing information exchanged via BGP supports only the destination- 305 based forwarding paradigm, which assumes that a router forwards a 306 packet based solely on the destination address carried in the IP 307 header of the packet. This, in turn, reflects the set of policy deci- 308 sions that can (and can not) be enforced using BGP. Note that some 309 policies can not be supported by the destination-based forwarding 310 paradigm, and thus require techniques such as source routing (aka 311 explicit routing) to be enforced. Such policies can not be enforced 312 using BGP either. For example, BGP does not enable one AS to send 313 traffic to a neighboring AS for forwarding to some destination 314 (reachable through but) beyond that neighboring AS intending that the 315 traffic take a different route to that taken by the traffic originat- 316 ing in the neighboring AS (for that same destination). On the other 317 hand, BGP can support any policy conforming to the destination-based 318 RFC DRAFT March 2003 320 forwarding paradigm. 322 BGP-4 provides a new set of mechanisms for supporting Classless 323 Inter-Domain Routing (CIDR) [RFC1518, RFC1519]. These mechanisms 324 include support for advertising a set of destinations as an IP prefix 325 and eliminating the concept of network "class" within BGP. BGP-4 326 also introduces mechanisms which allow aggregation of routes, includ- 327 ing aggregation of AS paths. 329 This document uses the term `Autonomous System' (AS) throughout. The 330 classic definition of an Autonomous System is a set of routers under 331 a single technical administration, using an interior gateway protocol 332 (IGP) and common metrics to determine how to route packets within the 333 AS, and using an inter-AS routing protocol to determine how to route 334 packets to other ASs. Since this classic definition was developed, it 335 has become common for a single AS to use several IGPs and sometimes 336 several sets of metrics within an AS. The use of the term Autonomous 337 System here stresses the fact that, even when multiple IGPs and met- 338 rics are used, the administration of an AS appears to other ASs to 339 have a single coherent interior routing plan and presents a consis- 340 tent picture of what destinations are reachable through it. 342 BGP uses TCP [RFC793] as its transport protocol. This eliminates the 343 need to implement explicit update fragmentation, retransmission, 344 acknowledgment, and sequencing. BGP listens on TCP port 179. The 345 error notification mechanism used in BGP assumes that TCP supports a 346 "graceful" close, i.e., that all outstanding data will be delivered 347 before the connection is closed. 349 Two systems form a TCP connection between one another. They exchange 350 messages to open and confirm the connection parameters. 352 The initial data flow is the portion of the BGP routing table that is 353 allowed by the export policy, called the Adj-Ribs-Out (see 3.2). 354 Incremental updates are sent as the routing tables change. BGP does 355 not require periodic refresh of the routing table. To allow local 356 policy changes to have the correct effect without resetting any BGP 357 connections, a BGP speaker SHOULD either (a) retain the current ver- 358 sion of the routes advertised to it by all of its peers for the dura- 359 tion of the connection, or (b) make use of the Route Refresh exten- 360 sion [RFC2918]. 362 KEEPALIVE messages may be sent periodically to ensure the liveness of 363 the connection. NOTIFICATION messages are sent in response to errors 364 or special conditions. If a connection encounters an error condition, 365 a NOTIFICATION message is sent and the connection is closed. 367 A peer in a different AS is referred to as an external peer, while a 368 RFC DRAFT March 2003 370 peer in the same AS is referred to as an internal peer. Internal BGP 371 and external BGP are commonly abbreviated IBGP and EBGP. 373 If a particular AS has multiple BGP speakers and is providing transit 374 service for other ASs, then care must be taken to ensure a consistent 375 view of routing within the AS. A consistent view of the interior 376 routes of the AS is provided by the IGP used within the AS. For the 377 purpose of this document, it is assumed that a consistent view of the 378 routes exterior to the AS is provided by having all BGP speakers 379 within the AS maintain IBGP with each other. Care must be taken to 380 ensure that the interior routers have all been updated with transit 381 information before the BGP speakers announce to other ASs that tran- 382 sit service is being provided. 384 This document specifies the base behavior of the BGP protocol. This 385 behavior can and is modified by extention specifications. When the 386 protocol is extended the new behavior is fully documented in the 387 extention specifications. 389 3.1 Routes: Advertisement and Storage 391 For the purpose of this protocol, a route is defined as a unit of 392 information that pairs a set of destinations with the attributes of a 393 path to those destinations. The set of destinations are systems whose 394 IP addresses are contained in one IP address prefix carried in the 395 Network Layer Reachability Information (NLRI) field of an UPDATE mes- 396 sage, and the path is the information reported in the path attributes 397 field of the same UPDATE message. 399 Routes are advertised between BGP speakers in UPDATE messages. Mul- 400 tiple routes that have the same path attributes can be advertised in 401 a single UPDATE message by including multiple prefixes in the NLRI 402 field of the UPDATE message. 404 Routes are stored in the Routing Information Bases (RIBs): namely, 405 the Adj-RIBs-In, the Loc-RIB, and the Adj-RIBs-Out, as described in 406 Section 3.2. 408 If a BGP speaker chooses to advertise the route, it MAY add to or 409 modify the path attributes of the route before advertising it to a 410 peer. 412 BGP provides mechanisms by which a BGP speaker can inform its peer 413 that a previously advertised route is no longer available for use. 414 There are three methods by which a given BGP speaker can indicate 415 that a route has been withdrawn from service: 417 RFC DRAFT March 2003 419 a) the IP prefix that expresses the destination for a previously 420 advertised route can be advertised in the WITHDRAWN ROUTES field 421 in the UPDATE message, thus marking the associated route as being 422 no longer available for use 424 b) a replacement route with the same NLRI can be advertised, or 426 c) the BGP speaker - BGP speaker connection can be closed, which 427 implicitly removes from service all routes which the pair of 428 speakers had advertised to each other. 430 Changing attribute of a route is accomplished by advertising a 431 replacement route. The replacement route carries new (changed) 432 attributes and has the same NLRI as the original route. 434 3.2 Routing Information Bases 436 The Routing Information Base (RIB) within a BGP speaker consists of 437 three distinct parts: 439 a) Adj-RIBs-In: The Adj-RIBs-In store routing information that has 440 been learned from inbound UPDATE messages received from other BGP 441 speakers. Their contents represent routes that are available as an 442 input to the Decision Process. 444 b) Loc-RIB: The Loc-RIB contains the local routing information 445 that the BGP speaker has selected by applying its local policies 446 to the routing information contained in its Adj-RIBs-In. These are 447 the routes that will be used by the local BGP speaker. The next 448 hop for each of these routes MUST be resolvable via the local BGP 449 speaker's Routing Table. 451 c) Adj-RIBs-Out: The Adj-RIBs-Out store the information that the 452 local BGP speaker has selected for advertisement to its peers. The 453 routing information stored in the Adj-RIBs-Out will be carried in 454 the local BGP speaker's UPDATE messages and advertised to its 455 peers. 457 In summary, the Adj-RIBs-In contain unprocessed routing information 458 that has been advertised to the local BGP speaker by its peers; the 459 Loc-RIB contains the routes that have been selected by the local BGP 460 speaker's Decision Process; and the Adj-RIBs-Out organize the routes 461 for advertisement to specific peers by means of the local speaker's 462 UPDATE messages. 464 Although the conceptual model distinguishes between Adj-RIBs-In, Loc- 465 RIB, and Adj-RIBs-Out, this neither implies nor requires that an 466 RFC DRAFT March 2003 468 implementation must maintain three separate copies of the routing 469 information. The choice of implementation (for example, 3 copies of 470 the information vs 1 copy with pointers) is not constrained by the 471 protocol. 473 Routing information that the BGP speaker uses to forward packets (or 474 to construct the forwarding table that is used for packet forwarding) 475 is maintained in the Routing Table. The Routing Table accumulates 476 routes to directly connected networks, static routes, routes learned 477 from the IGP protocols, and routes learned from BGP. Whether or not 478 a specific BGP route should be installed in the Routing Table, and 479 whether a BGP route should override a route to the same destination 480 installed by another source is a local policy decision, not specified 481 in this document. Besides actual packet forwarding, the Routing Table 482 is used for resolution of the next-hop addresses specified in BGP 483 updates (see Section 5.1.3). 485 4. Message Formats 487 This section describes message formats used by BGP. 489 BGP messages are sent over a TCP connection. A message is processed 490 only after it is entirely received. The maximum message size is 4096 491 octets. All implementations are required to support this maximum mes- 492 sage size. The smallest message that may be sent consists of a BGP 493 header without a data portion, or 19 octets. 495 All multi-octet fields are in network byte order. 497 4.1 Message Header Format 499 Each message has a fixed-size header. There may or may not be a data 500 portion following the header, depending on the message type. The lay- 501 out of these fields is shown below: 503 RFC DRAFT March 2003 505 0 1 2 3 506 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 508 | | 509 + + 510 | | 511 + + 512 | Marker | 513 + + 514 | | 515 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 516 | Length | Type | 517 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 519 Marker: 521 This 16-octet field is included for compatibility; it MUST be 522 set to all ones. 524 Length: 526 This 2-octet unsigned integer indicates the total length of the 527 message, including the header, in octets. Thus, e.g., it allows 528 one to locate in the TCP stream the (Marker field of the) next 529 message. The value of the Length field MUST always be at least 530 19 and no greater than 4096, and MAY be further constrained, 531 depending on the message type. No "padding" of extra data after 532 the message is allowed, so the Length field MUST have the 533 smallest value required given the rest of the message. 535 Type: 537 This 1-octet unsigned integer indicates the type code of the 538 message. This document defines the following type codes: 540 1 - OPEN 541 2 - UPDATE 542 3 - NOTIFICATION 543 4 - KEEPALIVE 545 [RFC2918] defines one more type code. 547 4.2 OPEN Message Format 549 After a TCP is established, the first message sent by each side is an 550 OPEN message. If the OPEN message is acceptable, a KEEPALIVE message 551 RFC DRAFT March 2003 553 confirming the OPEN is sent back. 555 In addition to the fixed-size BGP header, the OPEN message contains 556 the following fields: 558 0 1 2 3 559 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 560 +-+-+-+-+-+-+-+-+ 561 | Version | 562 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 563 | My Autonomous System | 564 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 565 | Hold Time | 566 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 567 | BGP Identifier | 568 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 569 | Opt Parm Len | 570 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 571 | | 572 | Optional Parameters (variable) | 573 | | 574 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 Version: 578 This 1-octet unsigned integer indicates the protocol version 579 number of the message. The current BGP version number is 4. 581 My Autonomous System: 583 This 2-octet unsigned integer indicates the Autonomous System 584 number of the sender. 586 Hold Time: 588 This 2-octet unsigned integer indicates the number of seconds 589 that the sender proposes for the value of the Hold Timer. Upon 590 receipt of an OPEN message, a BGP speaker MUST calculate the 591 value of the Hold Timer by using the smaller of its configured 592 Hold Time and the Hold Time received in the OPEN message. The 593 Hold Time MUST be either zero or at least three seconds. An 594 implementation MAY reject connections on the basis of the Hold 595 Time. The calculated value indicates the maximum number of 596 seconds that may elapse between the receipt of successive 597 KEEPALIVE, and/or UPDATE messages by the sender. 599 BGP Identifier: 601 RFC DRAFT March 2003 603 This 4-octet unsigned integer indicates the BGP Identifier of 604 the sender. A given BGP speaker sets the value of its BGP Iden- 605 tifier to an IP address assigned to that BGP speaker. The 606 value of the BGP Identifier is determined on startup and is the 607 same for every local interface and every BGP peer. 609 Optional Parameters Length: 611 This 1-octet unsigned integer indicates the total length of the 612 Optional Parameters field in octets. If the value of this field 613 is zero, no Optional Parameters are present. 615 Optional Parameters: 617 This field contains a list of optional parameters, where each 618 parameter is encoded as a triplet. 621 0 1 622 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 623 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 624 | Parm. Type | Parm. Length | Parameter Value (variable) 625 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 627 Parameter Type is a one octet field that unambiguously identi- 628 fies individual parameters. Parameter Length is a one octet 629 field that contains the length of the Parameter Value field in 630 octets. Parameter Value is a variable length field that is 631 interpreted according to the value of the Parameter Type field. 633 [RFC2842] defines the Capabilities Optional Parameter. 635 The minimum length of the OPEN message is 29 octets (including mes- 636 sage header). 638 4.3 UPDATE Message Format 640 UPDATE messages are used to transfer routing information between BGP 641 peers. The information in the UPDATE message can be used to construct 642 a graph describing the relationships of the various Autonomous Sys- 643 tems. By applying rules to be discussed, routing information loops 644 and some other anomalies may be detected and removed from inter-AS 645 routing. 647 An UPDATE message is used to advertise feasible routes sharing common 648 path attributes to a peer, or to withdraw multiple unfeasible routes 649 RFC DRAFT March 2003 651 from service (see 3.1). An UPDATE message MAY simultaneously adver- 652 tise a feasible route and withdraw multiple unfeasible routes from 653 service. The UPDATE message always includes the fixed-size BGP 654 header, and also includes the other fields as shown below (note, some 655 of the shown fields may not be present in every UPDATE message): 657 +-----------------------------------------------------+ 658 | Withdrawn Routes Length (2 octets) | 659 +-----------------------------------------------------+ 660 | Withdrawn Routes (variable) | 661 +-----------------------------------------------------+ 662 | Total Path Attribute Length (2 octets) | 663 +-----------------------------------------------------+ 664 | Path Attributes (variable) | 665 +-----------------------------------------------------+ 666 | Network Layer Reachability Information (variable) | 667 +-----------------------------------------------------+ 669 Withdrawn Routes Length: 671 This 2-octets unsigned integer indicates the total length of 672 the Withdrawn Routes field in octets. Its value allows the 673 length of the Network Layer Reachability Information field to 674 be determined as specified below. 676 A value of 0 indicates that no routes are being withdrawn from 677 service, and that the WITHDRAWN ROUTES field is not present in 678 this UPDATE message. 680 Withdrawn Routes: 682 This is a variable length field that contains a list of IP 683 address prefixes for the routes that are being withdrawn from 684 service. Each IP address prefix is encoded as a 2-tuple of the 685 form , whose fields are described below: 687 +---------------------------+ 688 | Length (1 octet) | 689 +---------------------------+ 690 | Prefix (variable) | 691 +---------------------------+ 693 The use and the meaning of these fields are as follows: 695 a) Length: 697 RFC DRAFT March 2003 699 The Length field indicates the length in bits of the IP 700 address prefix. A length of zero indicates a prefix that 701 matches all IP addresses (with prefix, itself, of zero 702 octets). 704 b) Prefix: 706 The Prefix field contains an IP address prefix followed by 707 the minimum number of trailing bits needed to make the end 708 of the field fall on an octet boundary. Note that the value 709 of trailing bits is irrelevant. 711 Total Path Attribute Length: 713 This 2-octet unsigned integer indicates the total length of the 714 Path Attributes field in octets. Its value allows the length of 715 the Network Layer Reachability field to be determined as speci- 716 fied below. 718 A value of 0 indicates that no Network Layer Reachability 719 Information field is present in this UPDATE message. 721 Path Attributes: 723 A variable length sequence of path attributes is present in 724 every UPDATE message, except for an UPDATE message that carries 725 only the withdrawn routes. Each path attribute is a triple 726 of variable 727 length. 729 Attribute Type is a two-octet field that consists of the 730 Attribute Flags octet followed by the Attribute Type Code 731 octet. 733 0 1 734 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 735 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 736 | Attr. Flags |Attr. Type Code| 737 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 739 The high-order bit (bit 0) of the Attribute Flags octet is the 740 Optional bit. It defines whether the attribute is optional (if 741 set to 1) or well-known (if set to 0). 743 The second high-order bit (bit 1) of the Attribute Flags octet 744 is the Transitive bit. It defines whether an optional attribute 745 is transitive (if set to 1) or non-transitive (if set to 0). 746 For well-known attributes, the Transitive bit MUST be set to 1. 748 RFC DRAFT March 2003 750 (See Section 5 for a discussion of transitive attributes.) 752 The third high-order bit (bit 2) of the Attribute Flags octet 753 is the Partial bit. It defines whether the information con- 754 tained in the optional transitive attribute is partial (if set 755 to 1) or complete (if set to 0). For well-known attributes and 756 for optional non-transitive attributes the Partial bit MUST be 757 set to 0. 759 The fourth high-order bit (bit 3) of the Attribute Flags octet 760 is the Extended Length bit. It defines whether the Attribute 761 Length is one octet (if set to 0) or two octets (if set to 1). 763 The lower-order four bits of the Attribute Flags octet are 764 unused. They MUST be zero when sent and MUST be ignored when 765 received. 767 The Attribute Type Code octet contains the Attribute Type Code. 768 Currently defined Attribute Type Codes are discussed in Section 769 5. 771 If the Extended Length bit of the Attribute Flags octet is set 772 to 0, the third octet of the Path Attribute contains the length 773 of the attribute data in octets. 775 If the Extended Length bit of the Attribute Flags octet is set 776 to 1, then the third and the fourth octets of the path 777 attribute contain the length of the attribute data in octets. 779 The remaining octets of the Path Attribute represent the 780 attribute value and are interpreted according to the Attribute 781 Flags and the Attribute Type Code. The supported Attribute Type 782 Codes, their attribute values and uses are the following: 784 a) ORIGIN (Type Code 1): 786 ORIGIN is a well-known mandatory attribute that defines the 787 origin of the path information. The data octet can assume 788 the following values: 790 Value Meaning 792 0 IGP - Network Layer Reachability Information 793 is interior to the originating AS 795 1 EGP - Network Layer Reachability Information 796 learned via the EGP protocol [RFC904] 797 RFC DRAFT March 2003 799 2 INCOMPLETE - Network Layer Reachability 800 Information learned by some other means 802 Usage of this attribute is defined in 5.1.1. 804 b) AS_PATH (Type Code 2): 806 AS_PATH is a well-known mandatory attribute that is composed 807 of a sequence of AS path segments. Each AS path segment is 808 represented by a triple . 811 The path segment type is a 1-octet long field with the fol- 812 lowing values defined: 814 Value Segment Type 816 1 AS_SET: unordered set of ASs a route in the 817 UPDATE message has traversed 819 2 AS_SEQUENCE: ordered set of ASs a route in 820 the UPDATE message has traversed 822 The path segment length is a 1-octet long field containing 823 the number of ASs (not the number of octets) in the path 824 segment value field. 826 The path segment value field contains one or more AS num- 827 bers, each encoded as a 2-octets long field. 829 Usage of this attribute is defined in 5.1.2. 831 c) NEXT_HOP (Type Code 3): 833 This is a well-known mandatory attribute that defines the 834 (unicast) IP address of the router that SHOULD be used as 835 the next hop to the destinations listed in the Network Layer 836 Reachability Information field of the UPDATE message. 838 Usage of this attribute is defined in 5.1.3. 840 d) MULTI_EXIT_DISC (Type Code 4): 842 This is an optional non-transitive attribute that is a four 843 octet unsigned integer. The value of this attribute MAY be 844 used by a BGP speaker's decision process to discriminate 845 among multiple entry points to a neighboring autonomous 846 RFC DRAFT March 2003 848 system. 850 Usage of this attribute is defined in 5.1.4. 852 e) LOCAL_PREF (Type Code 5): 854 LOCAL_PREF is a well-known attribute that is a four octet 855 unsigned integer. A BGP speaker uses it to inform other 856 internal peers of the advertising speaker's degree of pref- 857 erence for an advertised route. 859 Usage of this attribute is defined in 5.1.5. 861 f) ATOMIC_AGGREGATE (Type Code 6) 863 ATOMIC_AGGREGATE is a well-known discretionary attribute of 864 length 0. 866 Usage of this attribute is defined in 5.1.6. 868 g) AGGREGATOR (Type Code 7) 870 AGGREGATOR is an optional transitive attribute of length 6. 871 The attribute contains the last AS number that formed the 872 aggregate route (encoded as 2 octets), followed by the IP 873 address of the BGP speaker that formed the aggregate route 874 (encoded as 4 octets). This SHOULD be the same address as 875 the one used for the BGP Identifier of the speaker. 877 Usage of this attribute is defined in 5.1.7. 879 Network Layer Reachability Information: 881 This variable length field contains a list of IP address pre- 882 fixes. The length in octets of the Network Layer Reachability 883 Information is not encoded explicitly, but can be calculated 884 as: 886 UPDATE message Length - 23 - Total Path Attributes Length - 887 Withdrawn Routes Length 889 where UPDATE message Length is the value encoded in the fixed- 890 size BGP header, Total Path Attribute Length and Withdrawn 891 Routes Length are the values encoded in the variable part of 892 the UPDATE message, and 23 is a combined length of the fixed- 893 size BGP header, the Total Path Attribute Length field and the 894 Withdrawn Routes Length field. 896 RFC DRAFT March 2003 898 Reachability information is encoded as one or more 2-tuples of 899 the form , whose fields are described below: 901 +---------------------------+ 902 | Length (1 octet) | 903 +---------------------------+ 904 | Prefix (variable) | 905 +---------------------------+ 907 The use and the meaning of these fields are as follows: 909 a) Length: 911 The Length field indicates the length in bits of the IP 912 address prefix. A length of zero indicates a prefix that 913 matches all IP addresses (with prefix, itself, of zero 914 octets). 916 b) Prefix: 918 The Prefix field contains an IP address prefix followed by 919 enough trailing bits to make the end of the field fall on an 920 octet boundary. Note that the value of the trailing bits is 921 irrelevant. 923 The minimum length of the UPDATE message is 23 octets -- 19 octets 924 for the fixed header + 2 octets for the Withdrawn Routes Length + 2 925 octets for the Total Path Attribute Length (the value of Withdrawn 926 Routes Length is 0 and the value of Total Path Attribute Length is 927 0). 929 An UPDATE message can advertise at most one set of path attributes, 930 but multiple destinations, provided that the destinations share these 931 attributes. All path attributes contained in a given UPDATE message 932 apply to all destinations carried in the NLRI field of the UPDATE 933 message. 935 An UPDATE message can list multiple routes to be withdrawn from ser- 936 vice. Each such route is identified by its destination (expressed as 937 an IP prefix), which unambiguously identifies the route in the con- 938 text of the BGP speaker - BGP speaker connection to which it has been 939 previously advertised. 941 An UPDATE message might advertise only routes to be withdrawn from 942 service, in which case it will not include path attributes or Network 943 Layer Reachability Information. Conversely, it may advertise only a 944 feasible route, in which case the WITHDRAWN ROUTES field need not be 945 present. 947 RFC DRAFT March 2003 949 An UPDATE message SHOULD NOT include the same address prefix in the 950 WITHDRAWN ROUTES and Network Layer Reachability Information fields, 951 however a BGP speaker MUST be able to process UPDATE messages in this 952 form. A BGP speaker SHOULD treat an UPDATE message of this form as if 953 the WITHDRAWN ROUTES doesn't contain the address prefix. 955 4.4 KEEPALIVE Message Format 957 BGP does not use any TCP-based keep-alive mechanism to determine if 958 peers are reachable. Instead, KEEPALIVE messages are exchanged 959 between peers often enough as not to cause the Hold Timer to expire. 960 A reasonable maximum time between KEEPALIVE messages would be one 961 third of the Hold Time interval. KEEPALIVE messages MUST NOT be sent 962 more frequently than one per second. An implementation MAY adjust the 963 rate at which it sends KEEPALIVE messages as a function of the Hold 964 Time interval. 966 If the negotiated Hold Time interval is zero, then periodic KEEPALIVE 967 messages MUST NOT be sent. 969 A KEEPALIVE message consists of only message header and has a length 970 of 19 octets. 972 4.5 NOTIFICATION Message Format 974 A NOTIFICATION message is sent when an error condition is detected. 975 The BGP connection is closed immediately after sending it. 977 In addition to the fixed-size BGP header, the NOTIFICATION message 978 contains the following fields: 980 0 1 2 3 981 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 982 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 983 | Error code | Error subcode | Data (variable) | 984 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 986 Error Code: 988 This 1-octet unsigned integer indicates the type of NOTIFICA- 989 TION. The following Error Codes have been defined: 991 Error Code Symbolic Name Reference 992 RFC DRAFT March 2003 994 1 Message Header Error Section 6.1 996 2 OPEN Message Error Section 6.2 998 3 UPDATE Message Error Section 6.3 1000 4 Hold Timer Expired Section 6.5 1002 5 Finite State Machine Error Section 6.6 1004 6 Cease Section 6.7 1006 Error subcode: 1008 This 1-octet unsigned integer provides more specific informa- 1009 tion about the nature of the reported error. Each Error Code 1010 may have one or more Error Subcodes associated with it. If no 1011 appropriate Error Subcode is defined, then a zero (Unspecific) 1012 value is used for the Error Subcode field. 1014 Message Header Error subcodes: 1016 1 - Connection Not Synchronized. 1017 2 - Bad Message Length. 1018 3 - Bad Message Type. 1020 OPEN Message Error subcodes: 1022 1 - Unsupported Version Number. 1023 2 - Bad Peer AS. 1024 3 - Bad BGP Identifier. 1025 4 - Unsupported Optional Parameter. 1026 5 - [Deprecated - see Appendix A]. 1027 6 - Unacceptable Hold Time. 1029 UPDATE Message Error subcodes: 1031 1 - Malformed Attribute List. 1032 2 - Unrecognized Well-known Attribute. 1033 3 - Missing Well-known Attribute. 1034 4 - Attribute Flags Error. 1035 5 - Attribute Length Error. 1036 6 - Invalid ORIGIN Attribute. 1037 7 - [Deprecated - see Appendix A]. 1038 8 - Invalid NEXT_HOP Attribute. 1039 9 - Optional Attribute Error. 1040 10 - Invalid Network Field. 1042 RFC DRAFT March 2003 1044 11 - Malformed AS_PATH. 1046 Data: 1048 This variable-length field is used to diagnose the reason for 1049 the NOTIFICATION. The contents of the Data field depend upon 1050 the Error Code and Error Subcode. See Section 6 below for more 1051 details. 1053 Note that the length of the Data field can be determined from 1054 the message Length field by the formula: 1056 Message Length = 21 + Data Length 1058 The minimum length of the NOTIFICATION message is 21 octets (includ- 1059 ing message header). 1061 5. Path Attributes 1063 This section discusses the path attributes of the UPDATE message. 1065 Path attributes fall into four separate categories: 1067 1. Well-known mandatory. 1068 2. Well-known discretionary. 1069 3. Optional transitive. 1070 4. Optional non-transitive. 1072 Well-known attributes MUST be recognized by all BGP implementations. 1073 Some of these attributes are mandatory and MUST be included in every 1074 UPDATE message that contains NLRI. Others are discretionary and MAY 1075 or MAY NOT be sent in a particular UPDATE message. 1077 All well-known attributes MUST be passed along (after proper updat- 1078 ing, if necessary) to other BGP peers. 1080 In addition to well-known attributes, each path MAY contain one or 1081 more optional attributes. It is not required or expected that all BGP 1082 implementations support all optional attributes. The handling of an 1083 unrecognized optional attribute is determined by the setting of the 1084 Transitive bit in the attribute flags octet. Paths with unrecognized 1085 transitive optional attributes SHOULD be accepted. If a path with 1086 unrecognized transitive optional attribute is accepted and passed 1087 along to other BGP peers, then the unrecognized transitive optional 1088 attribute of that path MUST be passed along with the path to other 1089 RFC DRAFT March 2003 1091 BGP peers with the Partial bit in the Attribute Flags octet set to 1. 1092 If a path with recognized transitive optional attribute is accepted 1093 and passed along to other BGP peers and the Partial bit in the 1094 Attribute Flags octet is set to 1 by some previous AS, it is not set 1095 back to 0 by the current AS. Unrecognized non-transitive optional 1096 attributes MUST be quietly ignored and not passed along to other BGP 1097 peers. 1099 New transitive optional attributes MAY be attached to the path by the 1100 originator or by any other BGP speaker in the path. If they are not 1101 attached by the originator, the Partial bit in the Attribute Flags 1102 octet is set to 1. The rules for attaching new non-transitive 1103 optional attributes will depend on the nature of the specific 1104 attribute. The documentation of each new non-transitive optional 1105 attribute will be expected to include such rules. (The description of 1106 the MULTI_EXIT_DISC attribute gives an example.) All optional 1107 attributes (both transitive and non-transitive) MAY be updated (if 1108 appropriate) by BGP speakers in the path. 1110 The sender of an UPDATE message SHOULD order path attributes within 1111 the UPDATE message in ascending order of attribute type. The receiver 1112 of an UPDATE message MUST be prepared to handle path attributes 1113 within the UPDATE message that are out of order. 1115 The same attribute (attribute with the same type) can not appear more 1116 than once within the Path Attributes field of a particular UPDATE 1117 message. 1119 The mandatory category refers to an attribute which MUST be present 1120 in both IBGP and EBGP exchanges if NLRI are contained in the UPDATE 1121 message. Attributes classified as optional for the purpose of the 1122 protocol extension mechanism may be purely discretionary, or discre- 1123 tionary, required, or disallowed in certain contexts. 1125 attribute EBGP IBGP 1126 ORIGIN mandatory mandatory 1127 AS_PATH mandatory mandatory 1128 NEXT_HOP mandatory mandatory 1129 MULTI_EXIT_DISC discretionary discretionary 1130 LOCAL_PREF see Section 5.1.5 required 1131 ATOMIC_AGGREGATE see Section 5.1.6 and 9.1.4 1132 AGGREGATOR discretionary discretionary 1133 RFC DRAFT March 2003 1135 5.1 Path Attribute Usage 1137 The usage of each BGP path attribute is described in the following 1138 clauses. 1140 5.1.1 ORIGIN 1142 ORIGIN is a well-known mandatory attribute. The ORIGIN attribute is 1143 generated by the speaker that originates the associated routing 1144 information. Its value SHOULD NOT be changed by any other speaker. 1146 5.1.2 AS_PATH 1148 AS_PATH is a well-known mandatory attribute. This attribute identi- 1149 fies the autonomous systems through which routing information carried 1150 in this UPDATE message has passed. The components of this list can be 1151 AS_SETs or AS_SEQUENCEs. 1153 When a BGP speaker propagates a route which it has learned from 1154 another BGP speaker's UPDATE message, it modifies the route's AS_PATH 1155 attribute based on the location of the BGP speaker to which the route 1156 will be sent: 1158 a) When a given BGP speaker advertises the route to an internal 1159 peer, the advertising speaker SHALL NOT modify the AS_PATH 1160 attribute associated with the route. 1162 b) When a given BGP speaker advertises the route to an external 1163 peer, then the advertising speaker updates the AS_PATH attribute 1164 as follows: 1166 1) if the first path segment of the AS_PATH is of type 1167 AS_SEQUENCE, the local system prepends its own AS number as the 1168 last element of the sequence (put it in the leftmost position). 1169 If the act of prepending will cause an overflow in the AS_PATH 1170 segment, i.e. more than 255 ASs, it is legal to prepend a new 1171 segment of type AS_SEQUENCE and prepend its own AS number to 1172 this new segment. 1174 2) if the first path segment of the AS_PATH is of type AS_SET, 1175 the local system prepends a new path segment of type 1176 AS_SEQUENCE to the AS_PATH, including its own AS number in that 1177 RFC DRAFT March 2003 1179 segment. 1181 When a BGP speaker originates a route then: 1183 a) the originating speaker includes its own AS number in a path 1184 segment of type AS_SEQUENCE in the AS_PATH attribute of all UPDATE 1185 messages sent to an external peer. (In this case, the AS number of 1186 the originating speaker's autonomous system will be the only entry 1187 the path segment, and this path segment will be the only segment 1188 in the AS_PATH attribute). 1190 b) the originating speaker includes an empty AS_PATH attribute in 1191 all UPDATE messages sent to internal peers. (An empty AS_PATH 1192 attribute is one whose length field contains the value zero). 1194 Whenever the modification of the AS_PATH attribute calls for includ- 1195 ing or prepending the AS number of the local system, the local system 1196 MAY include/prepend more than one instance of its own AS number in 1197 the AS_PATH attribute. This is controlled via local configuration. 1199 5.1.3 NEXT_HOP 1201 The NEXT_HOP is a well-known mandatory attribute that defines the IP 1202 address of the router that SHOULD be used as the next hop to the des- 1203 tinations listed in the UPDATE message. The NEXT_HOP attribute is 1204 calculated as follows. 1206 1) When sending a message to an internal peer, if the route is not 1207 locally originated the BGP speaker SHOULD NOT modify the NEXT_HOP 1208 attribute, unless it has been explicitly configured to announce 1209 its own IP address as the NEXT_HOP. When announcing a locally 1210 originated route to an internal peer, the BGP speaker SHOULD use 1211 as the NEXT_HOP the interface address of the router through which 1212 the announced network is reachable for the speaker; if the route 1213 is directly connected to the speaker, or the interface address of 1214 the router through which the announced network is reachable for 1215 the speaker is the internal peer's address, then the BGP speaker 1216 SHOULD use for the NEXT_HOP attribute its own IP address (the 1217 address of the interface that is used to reach the peer). 1219 2) When sending a message to an external peer X, and the peer is 1220 one IP hop away from the speaker: 1222 - If the route being announced was learned from an internal 1223 peer or is locally originated, the BGP speaker can use for the 1224 NEXT_HOP attribute an interface address of the internal peer 1225 RFC DRAFT March 2003 1227 router (or the internal router) through which the announced 1228 network is reachable for the speaker, provided that peer X 1229 shares a common subnet with this address. This is a form of 1230 "third party" NEXT_HOP attribute. 1232 - Otherwise, if the route being announced was learned from an 1233 external peer, the speaker can use in the NEXT_HOP attribute an 1234 IP address of any adjacent router (known from the received 1235 NEXT_HOP attribute) that the speaker itself uses for local 1236 route calculation, provided that peer X shares a common subnet 1237 with this address. This is a second form of "third party" 1238 NEXT_HOP attribute. 1240 - Otherwise, if the external peer to which the route is being 1241 advertised shares a common subnet with one of the interfaces of 1242 the announcing BGP speaker, the speaker MAY use the IP address 1243 associated with such an interface in the NEXT_HOP attribute. 1244 This is known as a "first party" NEXT_HOP attribute. 1246 - By default (if none of the above conditions apply), the BGP 1247 speaker SHOULD use in the NEXT_HOP attribute the IP address of 1248 the interface that the speaker uses to establish the BGP con- 1249 nection to peer X. 1251 3) When sending a message to an external peer X, and the peer is 1252 multiple IP hops away from the speaker (aka "multihop EBGP"): 1254 - The speaker MAY be configured to propagate the NEXT_HOP 1255 attribute. In this case when advertising a route that the 1256 speaker learned from one of its peers, the NEXT_HOP attribute 1257 of the advertised route is exactly the same as the NEXT_HOP 1258 attribute of the learned route (the speaker just doesn't modify 1259 the NEXT_HOP attribute). 1261 - By default, the BGP speaker SHOULD use in the NEXT_HOP 1262 attribute the IP address of the interface that the speaker uses 1263 to establish the BGP connection to peer X. 1265 Normally the NEXT_HOP attribute is chosen such that the shortest 1266 available path will be taken. A BGP speaker MUST be able to support 1267 disabling advertisement of third party NEXT_HOP attributes to handle 1268 imperfectly bridged media. 1270 A route originated by a BGP speaker SHALL NOT be advertised to a peer 1271 using an address of that peer as NEXT_HOP. A BGP speaker SHALL NOT 1272 install a route with itself as the next hop. 1274 The NEXT_HOP attribute is used by the BGP speaker to determine the 1275 RFC DRAFT March 2003 1277 actual outbound interface and immediate next-hop address that SHOULD 1278 be used to forward transit packets to the associated destinations. 1280 The immediate next-hop address is determined by performing a recur- 1281 sive route lookup operation for the IP address in the NEXT_HOP 1282 attribute using the contents of the Routing Table, selecting one 1283 entry if multiple entries of equal cost exist. The Routing Table 1284 entry which resolves the IP address in the NEXT_HOP attribute will 1285 always specify the outbound interface. If the entry specifies an 1286 attached subnet, but does not specify a next-hop address, then the 1287 address in the NEXT_HOP attribute SHOULD be used as the immediate 1288 next-hop address. If the entry also specifies the next-hop address, 1289 this address SHOULD be used as the immediate next-hop address for 1290 packet forwarding. 1292 5.1.4 MULTI_EXIT_DISC 1294 The MULTI_EXIT_DISC is an optional non-transitive attribute which is 1295 intended to be used on external (inter-AS) links to discriminate 1296 among multiple exit or entry points to the same neighboring AS. The 1297 value of the MULTI_EXIT_DISC attribute is a four octet unsigned num- 1298 ber which is called a metric. All other factors being equal, the exit 1299 point with lower metric SHOULD be preferred. If received over EBGP, 1300 the MULTI_EXIT_DISC attribute MAY be propagated over IBGP to other 1301 BGP speakers within the same AS. The MULTI_EXIT_DISC attribute 1302 received from a neighboring AS MUST NOT be propagated to other neigh- 1303 boring ASs. 1305 A BGP speaker MUST IMPLEMENT a mechanism based on local configuration 1306 which allows the MULTI_EXIT_DISC attribute to be removed from a 1307 route. This MAY be done prior to determining the degree of preference 1308 of the route and performing route selection (decision process phases 1309 1 and 2). 1311 An implementation MAY also (based on local configuration) alter the 1312 value of the MULTI_EXIT_DISC attribute received over EBGP. This MAY 1313 be done prior to determining the degree of preference of the route 1314 and performing route selection (decision process phases 1 and 2). See 1315 Section 9.1.2.2 for necessary restrictions on this. 1317 5.1.5 LOCAL_PREF 1319 LOCAL_PREF is a well-known attribute that SHALL be included in all 1320 UPDATE messages that a given BGP speaker sends to the other internal 1321 RFC DRAFT March 2003 1323 peers. A BGP speaker SHALL calculate the degree of preference for 1324 each external route based on the locally configured policy, and 1325 include the degree of preference when advertising a route to its 1326 internal peers. The higher degree of preference MUST be preferred. A 1327 BGP speaker uses the degree of preference learned via LOCAL_PREF in 1328 its decision process (see Section 9.1.1). 1330 A BGP speaker MUST NOT include this attribute in UPDATE messages that 1331 it sends to external peers, except for the case of BGP Confederations 1332 [RFC3065]. If it is contained in an UPDATE message that is received 1333 from an external peer, then this attribute MUST be ignored by the 1334 receiving speaker, except for the case of BGP Confederations 1335 [RF3065]. 1337 5.1.6 ATOMIC_AGGREGATE 1339 ATOMIC_AGGREGATE is a well-known discretionary attribute. 1341 When a BGP speaker aggregates several routes for the purpose of 1342 advertisement to a particular peer, the AS_PATH of the aggregated 1343 route normally includes an AS_SET formed from the set of ASs from 1344 which the aggregate was formed. In many cases the network adminis- 1345 trator can determine that the aggregate can safely be advertised 1346 without the AS_SET and not form route loops. 1348 If an aggregate excludes at least some of the AS numbers present in 1349 the AS_PATH of the routes that are aggregated as a result of dropping 1350 the AS_SET, the aggregated route, when advertised to the peer, SHOULD 1351 include the ATOMIC_AGGREGATE attribute. 1353 A BGP speaker that receives a route with the ATOMIC_AGGREGATE 1354 attribute SHOULD NOT remove the attribute from the route when propa- 1355 gating it to other speakers. 1357 A BGP speaker that receives a route with the ATOMIC_AGGREGATE 1358 attribute MUST NOT make any NLRI of that route more specific (as 1359 defined in 9.1.4) when advertising this route to other BGP speakers. 1361 A BGP speaker that receives a route with the ATOMIC_AGGREGATE 1362 attribute needs to be cognizant of the fact that the actual path to 1363 destinations, as specified in the NLRI of the route, while having the 1364 loop-free property, may not be the path specified in the AS_PATH 1365 attribute of the route. 1367 RFC DRAFT March 2003 1369 5.1.7 AGGREGATOR 1371 AGGREGATOR is an optional transitive attribute which MAY be included 1372 in updates which are formed by aggregation (see Section 9.2.2.2). A 1373 BGP speaker which performs route aggregation MAY add the AGGREGATOR 1374 attribute which SHALL contain its own AS number and IP address. The 1375 IP address SHOULD be the same as the BGP Identifier of the speaker. 1377 6. BGP Error Handling. 1379 This section describes actions to be taken when errors are detected 1380 while processing BGP messages. 1382 When any of the conditions described here are detected, a NOTIFICA- 1383 TION message with the indicated Error Code, Error Subcode, and Data 1384 fields is sent, and the BGP connection is closed, unless it is 1385 explicitly stated that no NOTIFICATION message is to be sent and the 1386 BGP connection is not to be closed. If no Error Subcode is specified, 1387 then a zero MUST be used. 1389 The phrase "the BGP connection is closed" means that the TCP connec- 1390 tion has been closed, the associated Adj-RIB-In has been cleared, and 1391 that all resources for that BGP connection have been deallocated. 1392 Entries in the Loc-RIB associated with the remote peer are marked as 1393 invalid. The fact that the routes have become invalid is passed to 1394 other BGP peers before the routes are deleted from the system. 1396 Unless specified explicitly, the Data field of the NOTIFICATION mes- 1397 sage that is sent to indicate an error is empty. 1399 6.1 Message Header error handling. 1401 All errors detected while processing the Message Header are indicated 1402 by sending the NOTIFICATION message with Error Code Message Header 1403 Error. The Error Subcode elaborates on the specific nature of the 1404 error. 1406 The expected value of the Marker field of the message header is all 1407 ones. If the Marker field of the message header is not as expected, 1408 then a synchronization error has occurred and the Error Subcode is 1409 set to Connection Not Synchronized. 1411 If the Length field of the message header is less than 19 or greater 1412 RFC DRAFT March 2003 1414 than 4096, or if the Length field of an OPEN message is less than the 1415 minimum length of the OPEN message, or if the Length field of an 1416 UPDATE message is less than the minimum length of the UPDATE message, 1417 or if the Length field of a KEEPALIVE message is not equal to 19, or 1418 if the Length field of a NOTIFICATION message is less than the mini- 1419 mum length of the NOTIFICATION message, then the Error Subcode is set 1420 to Bad Message Length. The Data field contains the erroneous Length 1421 field. 1423 If the Type field of the message header is not recognized, then the 1424 Error Subcode is set to Bad Message Type. The Data field contains the 1425 erroneous Type field. 1427 6.2 OPEN message error handling. 1429 All errors detected while processing the OPEN message are indicated 1430 by sending the NOTIFICATION message with Error Code OPEN Message 1431 Error. The Error Subcode elaborates on the specific nature of the 1432 error. 1434 If the version number contained in the Version field of the received 1435 OPEN message is not supported, then the Error Subcode is set to 1436 Unsupported Version Number. The Data field is a 2-octets unsigned 1437 integer, which indicates the largest locally supported version number 1438 less than the version the remote BGP peer bid (as indicated in the 1439 received OPEN message), or if the smallest locally supported version 1440 number is greater than the version the remote BGP peer bid, then the 1441 smallest locally supported version number. 1443 If the Autonomous System field of the OPEN message is unacceptable, 1444 then the Error Subcode is set to Bad Peer AS. The determination of 1445 acceptable Autonomous System numbers is outside the scope of this 1446 protocol. 1448 If the Hold Time field of the OPEN message is unacceptable, then the 1449 Error Subcode MUST be set to Unacceptable Hold Time. An implementa- 1450 tion MUST reject Hold Time values of one or two seconds. An imple- 1451 mentation MAY reject any proposed Hold Time. An implementation which 1452 accepts a Hold Time MUST use the negotiated value for the Hold Time. 1454 If the BGP Identifier field of the OPEN message is syntactically 1455 incorrect, then the Error Subcode is set to Bad BGP Identifier. Syn- 1456 tactic correctness means that the BGP Identifier field represents a 1457 valid IP host address. 1459 If one of the Optional Parameters in the OPEN message is not 1460 RFC DRAFT March 2003 1462 recognized, then the Error Subcode is set to Unsupported Optional 1463 Parameters. 1465 If one of the Optional Parameters in the OPEN message is recognized, 1466 but is malformed, then the Error Subcode is set to 0 (Unspecific). 1468 6.3 UPDATE message error handling. 1470 All errors detected while processing the UPDATE message are indicated 1471 by sending the NOTIFICATION message with Error Code UPDATE Message 1472 Error. The error subcode elaborates on the specific nature of the 1473 error. 1475 Error checking of an UPDATE message begins by examining the path 1476 attributes. If the Withdrawn Routes Length or Total Attribute Length 1477 is too large (i.e., if Withdrawn Routes Length + Total Attribute 1478 Length + 23 exceeds the message Length), then the Error Subcode is 1479 set to Malformed Attribute List. 1481 If any recognized attribute has Attribute Flags that conflict with 1482 the Attribute Type Code, then the Error Subcode is set to Attribute 1483 Flags Error. The Data field contains the erroneous attribute (type, 1484 length and value). 1486 If any recognized attribute has Attribute Length that conflicts with 1487 the expected length (based on the attribute type code), then the 1488 Error Subcode is set to Attribute Length Error. The Data field con- 1489 tains the erroneous attribute (type, length and value). 1491 If any of the mandatory well-known attributes are not present, then 1492 the Error Subcode is set to Missing Well-known Attribute. The Data 1493 field contains the Attribute Type Code of the missing well-known 1494 attribute. 1496 If any of the mandatory well-known attributes are not recognized, 1497 then the Error Subcode is set to Unrecognized Well-known Attribute. 1498 The Data field contains the unrecognized attribute (type, length and 1499 value). 1501 If the ORIGIN attribute has an undefined value, then the Error Sub- 1502 code is set to Invalid Origin Attribute. The Data field contains the 1503 unrecognized attribute (type, length and value). 1505 If the NEXT_HOP attribute field is syntactically incorrect, then the 1506 Error Subcode is set to Invalid NEXT_HOP Attribute. The Data field 1507 contains the incorrect attribute (type, length and value). Syntactic 1508 RFC DRAFT March 2003 1510 correctness means that the NEXT_HOP attribute represents a valid IP 1511 host address. 1513 The IP address in the NEXT_HOP MUST meet the following criteria to be 1514 considered semantically correct: 1516 a) It MUST NOT be the IP address of the receiving speaker 1518 b) In the case of an EBGP where the sender and receiver are one IP 1519 hop away from each other, either the IP address in the NEXT_HOP 1520 MUST be the sender's IP address (that is used to establish the BGP 1521 connection), or the interface associated with the NEXT_HOP IP 1522 address MUST share a common subnet with the receiving BGP speaker. 1524 If the NEXT_HOP attribute is semantically incorrect, the error SHOULD 1525 be logged, and the route SHOULD be ignored. In this case, a NOTIFICA- 1526 TION message SHOULD NOT be sent, and connection SHOULD NOT be closed. 1528 The AS_PATH attribute is checked for syntactic correctness. If the 1529 path is syntactically incorrect, then the Error Subcode is set to 1530 Malformed AS_PATH. 1532 If the UPDATE message is received from an external peer, the local 1533 system MAY check whether the leftmost AS in the AS_PATH attribute is 1534 equal to the autonomous system number of the peer that sent the mes- 1535 sage. If the check determines that this is not the case, the Error 1536 Subcode is set to Malformed AS_PATH. 1538 If an optional attribute is recognized, then the value of this 1539 attribute is checked. If an error is detected, the attribute is dis- 1540 carded, and the Error Subcode is set to Optional Attribute Error. 1541 The Data field contains the attribute (type, length and value). 1543 If any attribute appears more than once in the UPDATE message, then 1544 the Error Subcode is set to Malformed Attribute List. 1546 The NLRI field in the UPDATE message is checked for syntactic valid- 1547 ity. If the field is syntactically incorrect, then the Error Subcode 1548 is set to Invalid Network Field. 1550 If a prefix in the NLRI field is semantically incorrect (e.g., an 1551 unexpected multicast IP address), an error SHOULD be logged locally, 1552 and the prefix SHOULD be ignored. 1554 An UPDATE message that contains correct path attributes, but no NLRI, 1555 SHALL be treated as a valid UPDATE message. 1557 RFC DRAFT March 2003 1559 6.4 NOTIFICATION message error handling. 1561 If a peer sends a NOTIFICATION message, and the receiver of the mes- 1562 sage detects an error in that message, the receiver can not use a 1563 NOTIFICATION message to report this error back to the peer. Any such 1564 error, such as an unrecognized Error Code or Error Subcode, SHOULD be 1565 noticed, logged locally, and brought to the attention of the adminis- 1566 tration of the peer. The means to do this, however, lies outside the 1567 scope of this document. 1569 6.5 Hold Timer Expired error handling. 1571 If a system does not receive successive KEEPALIVE and/or UPDATE 1572 and/or NOTIFICATION messages within the period specified in the Hold 1573 Time field of the OPEN message, then the NOTIFICATION message with 1574 Hold Timer Expired Error Code is sent and the BGP connection is 1575 closed. 1577 6.6 Finite State Machine error handling. 1579 Any error detected by the BGP Finite State Machine (e.g., receipt of 1580 an unexpected event) is indicated by sending the NOTIFICATION message 1581 with Error Code Finite State Machine Error. 1583 6.7 Cease. 1585 In absence of any fatal errors (that are indicated in this section), 1586 a BGP peer MAY choose at any given time to close its BGP connection 1587 by sending the NOTIFICATION message with Error Code Cease. However, 1588 the Cease NOTIFICATION message MUST NOT be used when a fatal error 1589 indicated by this section does exist. 1591 A BGP speaker MAY support the ability to impose an (locally config- 1592 ured) upper bound on the number of address prefixes the speaker is 1593 willing to accept from a neighbor. When the upper bound is reached, 1594 the speaker (under control of local configuration) either (a) dis- 1595 cards new address prefixes from the neighbor (while maintaining BGP 1596 connection with the neighbor), or (b) terminates the BGP connection 1597 with the neighbor. If the BGP speaker decides to terminate its BGP 1598 connection with a neighbor because the number of address prefixes 1599 received from the neighbor exceeds the locally configured upper 1600 RFC DRAFT March 2003 1602 bound, then the speaker MUST send to the neighbor a NOTIFICATION mes- 1603 sage with the Error Code Cease. 1605 6.8 BGP connection collision detection. 1607 If a pair of BGP speakers try simultaneously to establish a BGP con- 1608 nection to each other, then two parallel connections between this 1609 pair of speakers might well be formed. If the source IP address used 1610 by one of these connections is the same as the destination IP address 1611 used by the other, and the destination IP address used by the first 1612 connection is the same as the source IP address used by the other, we 1613 refer to this situation as connection collision. Clearly in the 1614 presence of connection collision, one of these connections MUST be 1615 closed. 1617 Based on the value of the BGP Identifier a convention is established 1618 for detecting which BGP connection is to be preserved when a colli- 1619 sion does occur. The convention is to compare the BGP Identifiers of 1620 the peers involved in the collision and to retain only the connection 1621 initiated by the BGP speaker with the higher-valued BGP Identifier. 1623 Upon receipt of an OPEN message, the local system MUST examine all of 1624 its connections that are in the OpenConfirm state. A BGP speaker MAY 1625 also examine connections in an OpenSent state if it knows the BGP 1626 Identifier of the peer by means outside of the protocol. If among 1627 these connections there is a connection to a remote BGP speaker whose 1628 BGP Identifier equals the one in the OPEN message, and this connec- 1629 tion collides with the connection over which the OPEN message is 1630 received then the local system performs the following collision reso- 1631 lution procedure: 1633 1. The BGP Identifier of the local system is compared to the BGP 1634 Identifier of the remote system (as specified in the OPEN mes- 1635 sage). Comparing BGP Identifiers is done by converting them to 1636 host byte order and treating them as (4-octet long) unsigned inte- 1637 gers. 1639 2. If the value of the local BGP Identifier is less than the 1640 remote one, the local system closes the BGP connection that 1641 already exists (the one that is already in the OpenConfirm state), 1642 and accepts the BGP connection initiated by the remote system. 1644 3. Otherwise, the local system closes newly created BGP connection 1645 (the one associated with the newly received OPEN message), and 1646 continues to use the existing one (the one that is already in the 1647 OpenConfirm state). 1649 RFC DRAFT March 2003 1651 Unless allowed via configuration, a connection collision with an 1652 existing BGP connection that is in Established state causes closing 1653 of the newly created connection. 1655 Note that a connection collision can not be detected with connections 1656 that are in Idle, or Connect, or Active states. 1658 Closing the BGP connection (that results from the collision resolu- 1659 tion procedure) is accomplished by sending the NOTIFICATION message 1660 with the Error Code Cease. 1662 7. BGP Version Negotiation 1664 BGP speakers MAY negotiate the version of the protocol by making mul- 1665 tiple attempts to open a BGP connection, starting with the highest 1666 version number each supports. If an open attempt fails with an Error 1667 Code OPEN Message Error, and an Error Subcode Unsupported Version 1668 Number, then the BGP speaker has available the version number it 1669 tried, the version number its peer tried, the version number passed 1670 by its peer in the NOTIFICATION message, and the version numbers that 1671 it supports. If the two peers do support one or more common versions, 1672 then this will allow them to rapidly determine the highest common 1673 version. In order to support BGP version negotiation, future versions 1674 of BGP MUST retain the format of the OPEN and NOTIFICATION messages. 1676 8. BGP Finite State machine 1678 This section specifies the BGP operation in terms of a Finite State 1679 Machine (FSM). The section falls into 2 parts: 1681 1) Description of Events for the State machine (Section 8.1) 1682 2) Description of the FSM (Section 8.2) 1684 Session Attributes required for each connection are; 1686 1) State 1687 2) Connect Retry timer 1688 3) Hold timer 1689 4) Hold time 1690 5) Keepalive timer 1691 6) Keepalive time 1692 7) Connect Retry Count 1693 8) Connect Retry Initial Value 1694 RFC DRAFT March 2003 1696 The optional Session attributes are listed below. These optional 1697 attributes may be supported either per connection or per local sys- 1698 tem: 1700 1) Delay Open flag 1701 2) Open Delay Timer 1702 3) Perform automatic start flag 1703 4) Perform automatic stop flag 1704 5) Passive TCP establishment flag 1705 6) Perform BGP peer oscillation damping flag 1706 (which will be denoted as stop_peer_flap in text) 1707 7) Idle Hold timer 1708 8) Perform Collision detect in Established flag 1709 9) Accept connections from un-configured peers 1710 10) Track TCP state flag 1711 11) Send NOTIFICATION without an OPEN flag 1713 8.1 Events for the BGP FSM 1715 8.1.1 Administrative Events 1717 Please note that only Event 1 (manual start) and Event 2 (manual 1718 stop) are mandatory administrative events. All other administrative 1719 events are optional. The optional attributes do not have to be sup- 1720 ported. However, if these attributes are supported, the state of the 1721 flags should be as indicated. 1723 Event1: Manual start 1725 Definition: Local system administrator manually starts peer 1726 connection. 1728 Status: Mandatory 1730 Optional 1731 attributes: Passive TCP establishment flag SHOULD not be set. 1733 Event2: Manual stop 1735 Definition: Local system administrator manually 1736 stops the peer connection. 1738 Status: Mandatory 1739 RFC DRAFT March 2003 1741 Event3: Automatic start 1743 Definition: Local system automatically starts the 1744 BGP connection. 1746 Status: Optional depending on local system. 1748 Optional 1749 attributes: 1) Perform automatic start flag SHOULD be set. 1750 if this event occurs. 1751 2) if the passive Passive TCP establishment flag 1752 is supported, it SHOULD not be set if this 1753 event occurs. 1754 3) if bgp peer oscillation damping is supported, 1755 the BGP stop_peer_flap flag should not be set 1756 when this event occurs. 1758 Event4: Manual start with passive TCP flag 1760 Definition: Local system administrator manually starts the peer 1761 connection, but has the passive TCP establishment 1762 enabled. The passive TCP establishment flag indicates 1763 that the peer will listen prior to 1764 establishing the connection. 1766 Status: Optional depending on local system. 1768 Optional 1769 attributes: 1) Passive TCP Establishment flag SHOULD be set. 1770 if this event occurs. 1771 2) If bgp peer oscilation damping is supported, the 1772 stop_peer_flap falg should not be set when 1773 this event occurs. 1775 Event5: Automatic start with passive TCP flag 1777 Definition: Local system automatically starts the 1778 BGP connection with the passive flag 1779 enabled. The passive flag indicates 1780 that the peer will listen prior to 1781 establishing a connection. 1783 Status: Optional depending on local system use 1784 of a passive connection and automatic start. 1786 RFC DRAFT March 2003 1788 Optional 1789 attributes: 1) Perform Automatic start flag SHOULD be set 1790 2) Passive TCP establishment flag SHOULD be set 1791 3) If the bgp peer oscillation flag is supported, 1792 the stop_peer_flap flag SHOULD not be set. 1794 Event6: Automatic start with bgp_stop_flap option set 1796 Definition: Local system automatically starts the 1797 BGP peer connection with peer oscillation 1798 damping enabled. The exact method of damping 1799 persistent peer oscillations is left up to the 1800 implementation, and is outside the scope of 1801 this document. 1803 Status: Optional, used only if the bgp peer has enabled 1804 bgp peer oscillation damping enabled with the 1805 optional attribute settings below. 1807 Optional 1808 attributes: 1) Perform automatic start flag SHOULD be set 1809 2) stop_peer_flap flag SHOULD be set 1810 3) Passive TCP establishment flag SHOULD not be set 1811 (cleared). 1813 Event 7: Automatic start with bgp_stop_flap option set and passive 1814 TCP establishment option set 1816 Definition: Local system automatically starts the 1817 BGP peer connection with peer oscillation 1818 damping enabled and passive TCP establishment 1819 enabled. The exact method of damping 1820 persistent peer oscillations is left up to the 1821 implementation, and is outside the scope of 1822 this document. 1824 Status: Optional, used only if the bgp peer has enabled 1825 bgp peer oscillation damping with following optional 1826 flags settings below. 1828 Optional 1829 attributes: 1) Perform automatic start flag SHOULD be set 1830 2) stop_peer_flap flag SHOULD be set 1831 3) Passive TCP establishment flag SHOULD be set 1832 RFC DRAFT March 2003 1834 Event8: Automatic stop 1836 Definition: Local system automatically stops the 1837 BGP connection. 1839 An example of an automatic stop event is 1840 exceeding the number of prefixes for a given 1841 peer and the local system automatically 1842 disconnecting the peer. 1844 Status: Optional depending on local system 1846 Optional 1847 attributes: 1) Peform automatic stop flag SHOULD Be set 1849 8.1.2 Timer Events 1851 Event9: Connect retry timer expires 1853 Definition: An event generated when the Connect Retry timer 1854 expires. 1856 Status: Mandatory 1858 Event10: Hold timer expires 1860 Definition: An event generated when the Hold Timer expires. 1862 Status: Mandatory 1864 Event11: Keepalive timer expires 1866 Definition: An event generated when the Keepalive timer expires. 1867 Status: Mandatory 1869 Event12: Open Delay timer expires 1871 Definition: An event generated when the Open Delay timer expires. 1873 Status: Optional 1875 Optional 1876 attributes: If this event occurs, 1877 RFC DRAFT March 2003 1879 1) Delay Open flag SHOULD be set 1880 2) Open Delay timer SHOULD be supported 1882 Event13: Idle hold timer expires 1884 Definition: An event generated when the Idle Hold Timer 1885 expires indicating that the session has completed 1886 waiting for a back-off period to prevent bgp peer 1887 oscillation. 1889 The Idle Hold Timer is only used when the persistent 1890 peer oscillation damping function is enabled. 1892 Implementations not implementing the presistent peer 1893 oscillation damping function may not have the Idle Hold 1894 Timer. 1896 Status: Optional 1898 Optional 1899 Attributes: If this event occurs: 1900 1) stop_peer_flap flag SHOULD be set indicating 1901 support for persistent peer oscillation damping 1902 functions, 1903 2) Idle Hold timer should be supported 1905 8.1.3 TCP Connection based Events 1907 Event14: TCP connection valid indication 1909 Definition: Event indicating the local system reception of 1910 a TCP connection request with a valid source 1911 IP address and TCP port, and valid destination 1912 IP address and TCP Port. The definition of 1913 invalid source, and invalid destination 1914 IP address is left to the implementation. 1916 BGP's destination port SHOULD be port 1917 179 as defined by IANA. 1919 TCP connection request is denoted by 1920 the local system receiving a TCP SYN. 1922 RFC DRAFT March 2003 1924 Status: Optional 1926 Optional 1927 Attributes: 1) The Track TCP state flag SHOULD be set if 1928 this event occurs. 1930 Event15: RCV TCP invalid indication 1932 Definition: Event indicating the local system reception of 1933 a TCP connection request with either 1934 an invalid source address or port 1935 number or an invalid destination 1936 address or port number. 1938 BGP destination port number SHOULD be 179 1939 as defined by IANA. 1941 Again, a TCP connection request 1942 denoted by local system receiving a TCP 1943 SYN. 1945 Status: Optional 1947 Optional 1948 Attributes: 1) The Track TCP state should be set if this event 1949 occurs. 1951 Event16: TCP connection request Acknowledged 1953 Definition: Event indicating the Local system's request 1954 to establish a TCP connection to the remote 1955 peer. 1957 The local system's TCP session sent a TCP 1958 SYN, and received a TCP SYN, ACK messages, 1959 and Sent a TCP ACK. 1961 Status: Mandatory 1963 Event17: TCP connection confirmed 1965 Definition: Event indicates that the local system receiving 1966 a confirmation that the TCP connection has 1967 been established by the remote site. 1969 The remote peer's TCP engine sent a TCP SYN. 1970 The local peer's TCP engine sent a SYN, ACK 1971 RFC DRAFT March 2003 1973 message, and now has received a final ACK. 1975 Status: Mandatory 1977 Event18: TCP connection fails 1979 Definition: Event indicates that the local system has 1980 received a TCP connection failure notice. 1982 The remote BGP peer's TCP machine could have 1983 sent a FIN. The local peer would respond 1984 with a FIN-ACK. Another alternative is that 1985 the local peer indicated a timeout in the 1986 TCP session and downed the connection. 1988 Status: Mandatory 1990 8.1.4 BGP Messages based Events 1992 Event19: BGPOpen 1994 Definition: An event is generated when a valid OPEN 1995 message has been received. 1997 Status: Mandatory 1999 optional 2000 attributes: 1) Delay Open flag SHOULD not be set 2001 2) Open Delay timer SHOULD not be running 2003 Event20: BGPOpen with Open Delay Timer running 2005 Definition: An event is generated when valid OPEN 2006 message has been received for a peer 2007 that has a successfully established 2008 transport connection and is currently 2009 delaying the sending of a BGP open 2010 message. 2012 Status: Optional 2014 Optional 2015 attributes: 1) Delay Open Flag SHOULD be set 2016 RFC DRAFT March 2003 2018 2) Open Delay Timer SHOULD be running. 2020 Event21: BGPHeaderErr 2022 Definition: An event is generated when a received 2023 BGP message header is not valid. 2025 Status: Mandatory 2027 Event22: BGPOpenMsgErr 2029 Definition: An event is generated when an OPEN message 2030 has been received with errors. 2032 Status: Mandatory 2034 Event23: Open collision dump 2036 Definition: An event generated administratively 2037 when a connection collision has been 2038 detected while processing an incoming 2039 OPEN message and this connection has been 2040 selected to disconnected. See Section 2041 6.8 for more information on collision 2042 detection. 2044 Event23 is an administrative based only 2045 implementation specific policy. This 2046 Event may occur if the FSM is implemented 2047 as two linked state machines. 2049 Status: Optional, depending on local system 2051 Optional 2052 Attributes: If the state machine is to process this 2053 attribute in Established state, 2054 1) Peform Collision detect in Established 2055 flag SHOULD be set. 2057 Please note: The Open collision dump can occur 2058 in Idle, Connect, Active, OpenSent, OpenConfirm 2059 without any optional flags being set. 2061 RFC DRAFT March 2003 2063 Event24: NotifMsgVerErr 2065 Definition: An event is generated when a 2066 NOTIFICATION message with "version 2067 error" is received. 2069 Status: Mandatory 2071 Event25: NotifMsg 2073 Definition: An event is generated when a 2074 NOTIFICATION messages is received and 2075 the error code is anything but 2076 "version error". 2078 Status: Mandatory 2080 Event26: KeepAliveMsg 2082 Definition: An event is generated when a KEEPALIVE 2083 message is received. 2085 Status: Mandatory 2087 Event27: UpdateMsg 2089 Definition: An event is generated when a valid 2090 UPDATE message is received. 2092 Status: Mandatory 2094 Event28: UpdateMsgErr 2096 Definition: An event is generated when an invalid 2097 UPDATE message is received. 2099 Status: Mandatory 2101 8.2 Description of FSM 2103 8.2.1 FSM Definition 2105 BGP MUST maintain a separate FSM for each configured peer, Each BGP 2106 peer paired in a potential connection unless configured to remain in 2107 RFC DRAFT March 2003 2109 the idle state, or configured to remain passive, will attempt to to 2110 connect to the other. For the purpose of this discussion, the active 2111 or connect side of the TCP connection (the side of a TCP connection 2112 (the side sending the first TCP SYN packet) is called outgoing. The 2113 passive or listening side (the sender of the first SYN ACK) is called 2114 an incoming connection (see Section 8.2.1.1 on the terms active and 2115 passive below). 2117 A BGP implementation MUST connect to and listen on TCP port 179 for 2118 incoming connections in addition to trying to connect to peers. For 2119 each incoming connection, a state machine MUST be instantiated. 2120 There exists a period in which the identity of the peer on the other 2121 end of an incoming connection is known but the BGP identifier is not 2122 known. During this time, both an incoming and an outgoing connection 2123 for the same configured peering may exist. This is referred to as a 2124 connection collision (see Section 6.8). 2126 A BGP implementation will have at most one FSM for each configured 2127 peering plus one FSM for each incoming TCP connection for which the 2128 peer has not yet been identified. Each FSM corresponds to exactly one 2129 TCP connection. 2131 There may be more than one connections between a pair of peers if the 2132 connections are configured to use a different pair of IP addresses. 2133 This is referred to as multiple "configured peerings" to the same 2134 peer. 2136 8.2.1.1 Terms "active" and "passive" 2138 The terms active and passive have been in our vocabulary for almost a 2139 decade and have proven useful. The words active and passive have 2140 slightly different meanings applied to a TCP connection or applied to 2141 a peer. There is only one active side and one passive side to any 2142 one TCP connection per the definition above and the state machine 2143 below. When a BGP speaker is configured active it may end up on 2144 either the active or passive side of the connection that eventually 2145 gets established. Once the TCP connection is completed, it doesn't 2146 matter which end was active and which end was passive and the only 2147 difference is which side of the TCP connection has port number 179. 2149 8.2.1.2 FSM and collision detection 2151 There is one FSM per BGP connection. Prior to determining what peer 2152 a connection is associated with there may be two connections for a 2153 RFC DRAFT March 2003 2155 given peer. There SHOULD be no more than one connection per peer. 2156 The collision detection identifies the case where there is more than 2157 one connection per peer and provides guidance for which connection to 2158 get rid of. When this occurs, the corresponding FSM for the connec- 2159 tion that is closed SHOULD be disposed of. 2161 8.2.1.3 FSM and Optional Attributes 2163 Optional Attributes specify either flags that augment the normal pro- 2164 cessing of the BGP FSM, or optional timers. If a Optional attribute 2165 can be set on a system, the Events and the BGP FSM actions must be 2166 support. For example, if the following options can be set in a BGP 2167 implementation: AutoStart and Passive TCP connection Establishment 2168 flag, then the events 3, 4 and 5 must be supported. 2170 If an Optional attribute is cannot be set (that is declared always 2171 off logically), the events supporting that set of options do not have 2172 to be supported. 2174 8.2.1.4 FSM Event numbers 2176 The Event numbers (1-28) utilized in this state machine description 2177 aid in specifying the behavior of the BGP state machine. Implementa- 2178 tions MAY use these numbers to provide network management informa- 2179 tion. 2181 8.2.2 Finite State Machine 2183 Idle state: 2185 Initially BGP is in the Idle state. 2187 In this state BGP refuses all incoming BGP connections. No 2188 resources are allocated to the peer. In response to a 2189 manual start event(Event1) or an automatic start 2190 event(Event3), the local system: 2191 - initializes all BGP resources, 2192 - sets ConnectRetryCnt (the connect retry counter) to zero 2193 - starts the connect retry timer with initial value, 2194 - initiates a TCP connection to the other BGP peer, 2195 - listens for a connection that may be initiated by 2196 the remote BGP peer, and 2197 - changes its state to Connect. 2199 RFC DRAFT March 2003 2201 An manual stop event (Event2) and Auto stop (Event 8) events are 2202 are ignored in the Idle state. 2204 In response to a manual start event with the passive TCP connection 2205 flag (Event 4) or automatic start with the passive TCP connection 2206 flag (Event 5), the local system: 2207 - initializes all BGP resources, 2208 - sets ConnectRetryCnt (the connect retry counter) to zero, 2209 - starts the connect retry timer with initial value, 2210 - listens for a connection that may be initiated by 2211 the remote peer, and 2212 - changes its state to Active. 2214 The exact value of the ConnectRetry timer is a local 2215 matter, but it SHOULD be sufficiently large to allow TCP 2216 initialization. 2218 If the persistent peer oscillation damping function is 2219 enabled, three additional events may occur within Idle state: 2220 - Automatic start with peer_stop_flap set [Event6], 2221 - Automatic start with peer_stop_flag set [Event7], 2222 - Idle Hold Timer expired [Event 13]. 2224 The method of preventing persistent peer oscillation is 2225 outside the scope of this document. 2227 Any other events [Events 9-12, 15-28] received in the Idle state does 2228 not cause change in the state of the local system. 2230 Connect State: 2232 In this state, BGP is waiting for the TCP connection to 2233 be completed. 2235 The start events [Event 1, 3-7] are ignored in connect 2236 state. 2238 In response to a manual stop event [Event2], the local system: 2239 - drops the TCP connection, 2240 - releases all BGP resources, 2241 - sets ConnectRetryCnt (the connect retry count) to zero 2242 - resets the connect retry timer (sets to zero), and 2243 - changes its state to Idle. 2245 RFC DRAFT March 2003 2247 In response to the connect retry timer expires event [Event 2248 9], the local system: 2249 - drops the TCP connection, 2250 - restarts the connect retry timer, 2251 - stops the Open Delay timer and resets the timer to zero, 2252 - initiates a TCP connection to the other BGP peer, 2253 - continues to listen for a connection that may be 2254 initiated by the remote BGP peer, and 2255 - stays in Connect state. 2257 If the Open Delay timer expires [Event12] in the connect 2258 state, the local system: 2259 - sends an OPEN message to its peer, 2260 - sets the hold timer to a large value, and 2261 - changes its state to OpenSent. 2263 If the BGP port receives a valid TCP connection indication 2264 [Event 14], the TCP connection is processed and 2265 the connection remains in the Connect state. 2267 If the TCP connection receives an invalid indication [Event 15]: 2268 the local system rejects the TCP connection and the connection 2269 remains in the Connect state. 2271 If the TCP connection succeeds [Event 16 or 2272 Event 17], the local system checks the Delay Open flag prior 2273 to processing. If the Delay Open flag is set, the local system: 2274 - clears the connect retry timer, 2275 - set the Open Delay timer to the initial value, and 2276 - stays in the Connect state. 2277 If the Delay Open flag is not set, the local system: 2278 - clears the connect retry timer, 2279 - completes BGP initialization 2280 - sends an OPEN message to its peer, 2281 - sets hold timer to a large value, and 2282 - changes its state to OpenSent. 2284 A hold timer value of 4 minutes is suggested. 2286 If the TCP connection fails [Event18], the local system checks 2287 the Open Delay Timer. If the Open Delay timer is running, 2288 the local system: 2289 - restarts the connect retry time with initial value, 2290 - stops the Open Delay timer and resets value to zero, 2291 - continues to listen for a connection that may be 2292 initiated by the remote BGP peer, and 2293 - changes its state to Active. 2294 If the open Delay timer is not running, the local system: 2296 RFC DRAFT March 2003 2298 - resets the connect retry timer (sets to zero), and 2299 - Drops the TCP connection, 2300 - Releases all BGP resources, 2301 - and goes to Idle State. 2303 If an OPEN message is received with the Open Delay timer is 2304 running [Event 20], the local system: 2305 - clears the connect retry timer (cleared to zero), 2306 - completes the BGP initialization, 2307 - stops and clears the Open Delay timer, 2308 - sends an OPEN message, 2309 - sends a Keepalive message, 2310 - If the hold timer value is non-zero, 2311 - start the keepalive timer to inital value, 2312 - reset the hold timer to the negotiated value, 2313 else if hold timer value is zero, 2314 - reset the keepalive timer. and 2315 - reset the hold timer value to zero. 2316 - and changes its state to OpenConfirm. 2318 If the value of the autonomous system field is the same as the local 2319 Autonomous System number, set the connection status to an internal 2320 connection; otherwise it is "external". 2322 If BGP message header checking detects an error [Event 21] or 2323 OPEN message checking detects an error [Event 22] (see section 2324 6.2), the local system: 2325 - (optionally) If the Send Notification without Open flag is set, 2326 then the local system first sends a NOTIFICATION message 2327 with the appropriate error code, and then 2329 - resets the connect retry timer (sets to zero), 2330 - releases all BGP resources, 2331 - drops the TCP connection, 2332 - increments the ConnectRetryCnt (connect retry count) by 1, 2333 - [optionally] performs peer oscillation damping, 2334 - and goes to Idle. 2336 If a NOTIFICATION message is received with a version 2337 error[Event24], the local system checks the Open Delay timer. 2338 If the Open Delay timer is running, the local system: 2339 - resets the connect retry timer (sets to zero), 2340 - stops and reset the Open Delay timer (sets to zero), 2341 - releases all BGP resources, 2342 - drops the TCP connection, 2343 - changes its state to Idle. 2344 If the Open Delay timer is not running, the local system: 2345 - resets the connect retry timer (sets to zero), 2346 RFC DRAFT March 2003 2348 - releases all BGP resources, 2349 - drops the TCP connection, 2350 - increments the ConnectRetryCnt (connect retry count) by 1, 2351 - optionally performs peer oscillation damping, and 2352 - changes its state to Idle. 2354 In response to any other events [Events 8,10-11,13,19,23, 2355 25-28] the local system: 2356 - if the connect retry timer is running, 2357 stop and reset the connect retry timer (sets to zero), 2358 - if the Delay Open timer is running, 2359 stop and reset the Delay Open timer (sets to zero), 2360 - releases all BGP resources, 2361 - drops the TCP connection, 2362 - increments the ConnectRetryCnt (connect retry count) by 1, 2363 - optionally performs peer oscillation damping, and 2364 - changes its state to Idle. 2366 Active State: 2368 In this state BGP is trying to acquire a peer by listening 2369 for and accepting a TCP connection. 2371 The start events [Event1, 3-7] are ignored in the Active 2372 state. 2374 A manual stop event[Event2], the local system: 2375 - If the Delay Open timer is running and the 2376 Send NOTIFICATION without Open flag is set, 2377 the local system Sends a NOTIFICATION with a Cease, 2378 - releases all BGP resources including 2379 - stopping the Open delay timer 2380 - drops the TCP connection, 2381 - sets ConnectRetryCnt (connect retry count) to zero 2382 - resets the connect retry timer (sets to zero), 2383 - changes its state to Idle. 2385 In response the ConnectRetry timer expires event[Event9], 2386 the local system: 2387 - restarts the connect retry timer (with initial value), 2388 - initiates a TCP connection to the other BGP peer, 2389 - Continues to listen for TCP connection that may be 2390 initiated by remote BGP peer, 2391 - and changes its state to Connect. 2393 RFC DRAFT March 2003 2395 If the local system has the Open Delay timer expired 2396 [Event12], the local system: 2397 - clears the connect retry timer (set to zero), 2398 - stops and clears the Open Delay timer (set to zero), 2399 - completes the BGP initialization, 2400 - sends the OPEN message to it's remote peer, 2401 - sets its hold timer to a large value, and 2402 - changes its state to OpenSent. 2404 A hold timer value of 4 minutes is also suggested for this 2405 state transition. 2407 If the local system receives a valid TCP indication 2408 [Event 14], the local system processes the TCP connection 2409 flags, and stays in Active state. 2411 If the local system receives an invalid TCP indication [Event 15]: 2412 the local system rejects the TCP connection, and stays in 2413 the Active State. 2415 A TCP connection succeeds [Event 16 or Event 17], the 2416 local system checks the "Delay Open Flag" prior to 2417 processing. If the Delay Open flag is set, the local system 2418 o clears the connect retry timer, 2419 o sets the BGP Open Delay timer to the initial value, and 2420 o stays in the Active state. 2422 -If the Delay Open flag is not set, the local system 2423 o clears the connect retry timer, 2424 o completes the BGP initialization, 2425 o sends the OPEN message to it's peer, 2426 o sets its hold timer to a large value, and 2427 o changes its state to OpenSent. 2429 A hold timer value of 4 minutes is suggested as a "large value" for 2430 the hold timer. 2432 If the local system receives a TCP connection fails event [Event 18], 2433 the local system will: 2434 - restart connect retry timer (with initial value), 2435 - stops and clears Open Delay Timer (sets the value to zero), 2436 - release all BGP resources 2437 - Acknowledge the drop of TCP connection if 2438 TCP disconnect (send a FIN ACK), 2439 - Increment ConnectRetryCnt (connect retry count) by 1, and 2440 - optionally perform peer oscillation damping, 2441 RFC DRAFT March 2003 2443 - and go to to Idle. 2445 If an OPEN message is received with the Open Delay timer is 2446 running [Event 20], the local system 2447 - clears the connect retry timer (cleared to zero), 2448 - stops and clears the Open Delay timer 2449 - completes the BGP initialization, 2450 - sends an OPEN message, 2451 - send a Keepalive message, and 2452 - if the hold timer value is non-zero, 2453 - starts the keepalive timer to initial value, 2454 - resets the hold timer to the negotiated value, 2455 else if the hold timer is zero 2456 - resets the keepalive timer (set to zero), 2457 - resets the hold timer to zero. 2458 - changes its state to OpenConfirm. 2460 If the value of the autonomous system field is the same as the local 2461 Autonomous System number, set the connection status to an internal 2462 connection; otherwise it is "external". 2464 If BGP message header checking detects an error [Event 21] or OPEN 2465 message checking detects an error [Event 22] (see section 6.2), the 2466 local system: 2467 - (optionally) sends NOTIFICATION message with the 2468 appropriate error code, 2469 - resets the connect retry timer (sets to zero), 2470 - releases all BGP resources, 2471 - drops the TCP connection, 2472 - increments the ConnectRetryCnt (connect retry count) by 1, 2473 - [optionally] performs peer oscillation damping, 2474 - and goes to Idle. 2476 If a NOTIFICATION message is received with a version 2477 error[Event24], the local system checks the Open Delay timer. 2478 If the Open Delay timer is running, the local system: 2479 - resets the connect retry timer (sets to zero), 2480 - stops and reset the Open Delay timer (sets to zero, 2481 - releases all BGP resources, 2482 - drops the TCP connection, 2483 - changes its state to Idle. 2484 If the Open Delay timer is not running, the local system: 2485 - resets the connect retry timer (sets to zero), 2486 - releases all BGP resources, 2487 - drops the TCP connection, 2488 - increments the ConnectRetryCnt (connect retry count) by 1, 2489 - optionally performs peer oscillation damping, and 2490 RFC DRAFT March 2003 2492 - changes its state to Idle 2494 In response to any other event [Events 8,10-11,13,19,23,25-28], 2495 the local system: 2496 - resets the connect retry timer (sets to zero), 2497 - drops the TCP connection, 2498 - releases all BGP resources, 2499 - increments the ConnectRetryCnt (connect retry count) by one, 2500 - optionally performs peer oscillation damping, and 2501 - changes its state to Idle. 2503 OpenSent: 2505 In this state BGP waits for an OPEN message from its peer. 2507 The Start events [Event1, 3-7] are ignored in the OpenSent 2508 state. 2510 If a manual stop event [Event 2] is issued in Open sent 2511 state, the local system: 2512 - sends the NOTIFICATION with a cease, 2513 - release all BGP resources, 2514 - drops the TCP connection, 2515 - set ConnectRetryCnt (connect retry count) to zero, 2516 - resets the Connect Retry timer (set to zero), and 2517 - changes its state to Idle. 2519 If an automatic stop event [Event 8] is issued in OpenSent 2520 state, the local system: 2521 - sends the NOTIFICATION with a cease, 2522 - release all the BGP resources 2523 - drops the TCP connection, 2524 - increments the ConnectRetryCnt (connect retry count) by 1, 2525 - optionally performs peer oscillation damping, and 2526 - changes its state to Idle. 2528 If the Hold Timer expires[Event 10], the local system: 2529 - send a NOTIFICATION message with error code Hold 2530 Timer Expired, 2531 - reset the connect retry timer (sets to zero), 2532 - releases all BGP resources, 2533 - drops the TCP connection, 2534 - increments the ConnectRetryCnt (connect retry count) by 1, and 2535 - changes its state to Idle. 2537 If a TCP indication is received for valid connection 2538 [Event 14] or TCP request aknowledgement [Event 16] 2539 RFC DRAFT March 2003 2541 is received, or a TCP connect confirm [Event 17] is 2542 received a second TCP session may be in progress. This 2543 second TCP session is tracked per the Connection Collision 2544 processing (Section 6.8) until an OPEN message is received. 2546 A TCP connection for an invalid port [Event 15] is ignored. 2548 If a TCP connection fails event [Event18] indication is received 2549 the local system: 2550 - closes the BGP connection, 2551 - restarts the Connect Retry timer, 2552 - continues to listen for a connection that may be 2553 initiated by the remote BGP peer, and 2554 - changes its state to Active. 2556 When an OPEN message is received, all fields are checked 2557 for correctness. If there are no errors in the OPEN message 2558 [Event 19] the local system: 2559 - resets the Open Delay timer to zero, 2560 - reset BGP Connect Timer to zero, 2561 - sends a KEEPALIVE message and 2562 - sets a KeepAlive timer (via the text below) 2563 - sets the hold timer according to the negotiated value 2564 (see Section 4.2), and 2565 - changes its state to OpenConfirm. 2567 If the negotiated hold time value is zero, then the Hold and 2568 KeepAlive timers are not started. If the value of the Autonomous 2569 System field is the same as the local Autonomous System number, 2570 then the connection is an "internal" connection; otherwise, it 2571 is an "external" connection. (This will impact UPDATE processing 2572 as described below.) 2574 If the BGP message header checking [Event21] or OPEN message 2575 check detects an error (see Section 6.2)[Event22], the local system: 2576 - sends a NOTIFICATION message with appropriate error 2577 code, 2578 - resets the connect retry timer (sets to zero), 2579 - releases all BGP resources, 2580 - drops the TCP connection 2581 - increments the ConnectRetryCnt (connect retry cout) by 1, 2582 - optionally performs peer oscillation damping, and 2583 - changes its state to Idle. 2585 Collision detection mechanisms (Section 6.8) need to be 2586 applied when a valid BGP OPEN message is received [Event 19 or 2587 RFC DRAFT March 2003 2589 Event 20]. Please refer to Section 6.8 for the details of 2590 the comparison. An administrative collision detect is when 2591 BGP implementation determines my means outside the scope of 2592 this document that a connection collision has occurred. 2594 If a connection in OpenSent is determined to be the 2595 connection that must be closed, an open collision dump [Event 23] 2596 is signaled to the state machine. If such an event is 2597 received in OpenSent, the local system: 2598 - sends a NOTIFICATION with a Cease 2599 - resets the connect retry timer, 2600 - releases all BGP resources, 2601 - drops the TCP connection, 2602 - increments ConnectRetryCnt (connect rery count) by 1, 2603 - optionally performs peer oscillation damping, and 2604 - changes its state to Idle. 2606 If a NOTIFICATION message is received with a version 2607 error[Event24], the local system: 2608 - resets the connect retry timer (sets to zero) 2609 - releases all BGP resources, 2610 - drops the TCP connection, 2611 - changes its state to Idle. 2613 In response to any other event [Events 9, 11-13,20,25-28], 2614 the local system: 2615 - sends the NOTIFICATION with the Error Code Finite 2616 state machine error, 2617 - resets the connect retry timer (sets to zero), 2618 - releases all BGP resources 2619 - drops the TCP connection, 2620 - increments the ConnectRetryCnt (connect retry count) by 1, 2621 - optionally performs peer oscillation damping, and 2622 - changes its state to Idle. 2624 OpenConfirm State: 2626 In this state BGP waits for a KEEPALIVE or NOTIFICATION 2627 message. 2629 Any start event [Event1, 3-7] is ignored in the OpenConfirm 2630 state. 2632 RFC DRAFT March 2003 2634 In response to a manual stop event[Event 2] initiated by 2635 the operator, the local system: 2636 - sends the NOTIFICATION message with Cease, 2637 - releases all BGP resources, 2638 - drop the TCP connection, 2639 - sets the ConnectRetryCnt (connect retry count) to zero 2640 - sets the connect retry timer to zero, and 2641 - changes its state to Idle. 2643 In response to the Automatic stop event initiated by the 2644 system[Event 8], the local system: 2645 - sends the NOTIFICATION message with Cease, 2646 - connect retry timer reset (set to zero) 2647 - release all BGP resources, 2648 - drops the TCP connection, 2649 - increments the ConnectRetryCnt (connect retry count) 2650 by 1, 2651 - optionally performs peer oscillation damping, 2652 - changes its state to Idle. 2654 If the Hold Timer expires before a KEEPALIVE message is 2655 received [Event 10], the local system: 2656 - send the NOTIFICATION message with the error code 2657 set to Hold Time Expired, 2658 - resets the connect retry timer (sets the timer to to 2659 zero), 2660 - releases all BGP resources, 2661 - drops the TCP connection, 2662 - increments the ConnectRetryCnt (connect retry count) 2663 by 1, 2664 - optionally performs peer oscillation damping, 2665 and 2666 - changes its state to Idle. 2668 If the local system receives a KEEPALIVE timer expires 2669 event [Event 11], the system: 2670 - sends a KEEPALIVE message, 2671 - restarts the Keepalive timer, and 2672 - remains in OpenConfirmed state. 2674 In the event of TCP connection valid indication [Event 14], or TCP 2675 connection succeeding [Event 16 or Event 17] while in OpenConfirm, 2676 the local system needs to track the 2nd connection. 2678 If a TCP connection is attempted to an invalid port [Event 2679 15], the local system will ignore the second connection 2680 attempt. 2682 RFC DRAFT March 2003 2684 If the local system receives a TCP connection fails event 2685 [Event 18] from the underlying TCP. or a NOTIFICATION 2686 message [Event 25] the local system: 2687 - resets the connect retry timer (sets the timer to 2688 zero), 2689 - releases all BGP resources, 2690 - drops the TCP connection, 2691 - increments the ConnectRetryCnt (connect retry count) 2692 by 1, 2693 - optionally performs peer oscillation damping, 2694 - changes its state to Idle. 2696 If the local system receives a NOTIFICATION message [Event 24] with 2697 a version error, the local system: 2698 - resets the connect retry timer (sets the timer to zero), 2699 - releases all BGP resources, 2700 - drops the TCP connection, 2701 - changes its state to Idle. [Verify this/or above] 2703 If the OPEN message is valid [Event 19], the collision 2704 detect function is processed per Section 6.8. If this 2705 connection is to be dropped due to connection collision, the 2706 local system: 2707 - sends a NOTIFICATION with a Cease 2708 - resets the Connect timer (set to zero), 2709 - releases all BGP resources, 2710 - drops the TCP connection (send TCP FIN), 2711 - increments the ConnectRetryCnt by 1 (connect retry count), and 2712 - optionally performs peer oscillation damping. 2714 If an OPEN message is received, all fields are check for 2715 correctness. If the BGP message header checking [Event21] 2716 or OPEN message check detects an error (see Section 2717 6.2)[Event22], the local system: 2718 - sends a NOTIFICATION message with appropriate error 2719 code, 2720 - resets the connect retry timer (sets the timer to 2721 zero), 2722 - releases all BGP resources, 2723 - drops the TCP connection, 2724 - increments the ConnectRetryCnt (connect retry count) by 1, 2725 - optionally performs peer oscillation damping, and 2726 - changes its state to Idle. 2728 If during the processing of another OPEN message, the BGP 2729 RFC DRAFT March 2003 2731 implementation determines my means outside the scope of 2732 this document that a connection collision has occurred and 2733 this connection is to be closed, the local system will 2734 issue a open collision dump [Event 23]. When the local 2735 system receives a open collision dump event [Event 23], the 2736 local system: 2737 - send a NOTIFICATION with a Cease 2738 - resets the connect retry timer, 2739 - releases all BGP resources 2740 - drops all TCP connection, 2741 - increments the ConnectRetryCnt (connect retry count) by 1, 2742 - optionally performs peer oscillation damping, and 2743 - changes its state to Idle. 2745 If the local system receives a KEEPALIVE message[Event 26], 2746 - restarts the Hold timer, and 2747 - changes its state to Established. 2749 In response to any other event [Events 9, 12-13, 27-28], 2750 the local system: 2751 - sends a NOTIFICATION with a code of Finite State 2752 Machine Error, 2753 - resets the connect retry timer (sets to zero) 2754 - releases all BGP resources, 2755 - drops the TCP connection, 2756 - increments the ConnectRetryCnt (connect retrycount) by 1, 2757 - optionally performs peer oscillation damping, and 2758 - changes its state to Idle. 2760 Established State: 2762 In the Established state BGP can exchange UPDATE, 2763 NOTFICATION, and KEEPALIVE messages with its peer. 2765 Any start event (Event 1, 3-7) is ignored in the 2766 Established state. 2768 In response to a manual stop event (initiated by an 2769 operator)[Event2], the local sytem: 2770 - sends the NOTIFICATION message with Cease, 2771 - resets the connect retry timer to zero (0), 2772 - delete all routes associated with this connection, 2773 - release BGP resources, 2774 - drops TCP connection, 2775 - sets ConnectRetryCnt (connect retry count) 2776 RFC DRAFT March 2003 2778 to zero (0), and 2779 - changes its state to Idle. 2781 In response to an automatic stop event initiated by the 2782 system (automatic) [Event8], the local system: 2783 - sends a NOTIFICATION with Cease, 2784 - resets the connect retry timer (sets to zero) 2785 - deletes all routes associated with this connection, 2786 - releases all BGP resources, 2787 - drops the TCP connection, 2788 - increments the ConnectRetryCnt (connect retry count) 2789 by 1, 2790 - optionally performs peer oscillation damping, and 2791 - changes its state to Idle. 2793 An example automatic stop event is exceeding the number of 2794 prefixes for a given peer and the local system 2795 automatically disconnecting the peer. 2797 If the Hold timer expires [Event10], the local system: 2798 - sends a NOTIFICATION message with Error Code Hold 2799 Timer Expired, 2800 - resets the connect retry timer (sets to zero), 2801 - releases all BGP resources, 2802 - drops the TCP connection, 2803 - increments the ConnectRetryCnt (connect retry count) 2804 by 1, 2805 - optionally performs peer oscillation damping, and 2806 - changes its state to Idle. 2808 If the KeepAlive timer expires [Event11], the local system 2809 sends a KEEPALIVE message, it restarts its KeepAlive timer, 2810 unless the negotiated Hold Time value is zero. 2812 Each time time the local system sends a KEEPALIVE or UPDATE 2813 message, it restarts its KeepAlive timer, unless the 2814 negotiated Hold Time value is zero. 2816 A TCP connection indication [Event 14] received 2817 for a valid port will cause the 2nd connection to be 2818 tracked. 2820 A TCP connection indications for invalid port [Event 15], 2821 will be ignored. 2823 In response to a TCP connection succeeds [Event 16 2824 RFC DRAFT March 2003 2826 or Event 17], the 2nd connection SHALL be tracked until 2827 it sends an OPEN message. 2829 If a valid OPEN message [Event 19] is received, it will be 2830 checked to see if it collides (Section 6.8) with any other 2831 session. If the BGP implementation determines that this 2832 connection needs to be terminated, it will process an open 2833 collision dump event[Event 23]. If this session needs to be 2834 terminated, the connection will be terminated by: 2836 - send a NOTIFICATION with a Cease, 2837 - resets the connect retry time (sets to zero), 2838 - deletes all routes associated with this connection, 2839 - release all BGP resources, 2840 - drops the TCP connection, 2841 - increments ConnectRetryCnt (connect retry count) 2842 by 1, 2843 - optionally performs peer oscillation damping, and 2844 - changes its state to Idle. 2846 If the local system receives a NOTIFICATION message 2847 [Event24 or Event 25] or a TCP connections fails [Event18] 2848 from the underlying TCP, it: 2849 - resets the connect retry timer (sets to zero), 2850 - delete all routes associated with this connection, 2851 - releases all the BGP resources, 2852 - drops the TCP connection, 2853 - increments the ConnectRetryCnt (connect retry count) 2854 by 1, and 2855 - changes its state to Idle. 2857 If the local system receives a KEEPALIVE message 2858 [Event 26], the local system will: 2859 - restarts its Hold Timer, if the negotiated Hold Time 2860 value is non-zero, and 2861 - remain in the Established state. 2863 If the local system receives an UPDATE message [Event27], 2864 the local system will: 2865 - process the update packet 2866 - restarts its Hold timer, if the negotiated Hold Time 2867 value is non-zero, and 2868 - remain in the Established state. 2870 RFC DRAFT March 2003 2872 If the local system receives an UPDATE message, and the 2873 UPDATE message error handling procedure (see Section 6.3) 2874 detects an error [Event28], the local system: 2875 - sends a NOTIFICATION message with Update error, 2876 - resets the connect retry timer (sets to zero), 2877 - delets all routes associated with this connection, 2878 - releases all BGP resources, 2879 - drops the TCP connection, 2880 - increments the ConnectRetryCnt (connect retry count) 2881 by 1, 2882 - optionally performs peer oscillation damping, and 2883 - changes its state to Idle. 2885 In response to any other event [Events 9, 12-13, 20-22] the 2886 local system: 2887 - sends a NOTIFICATION message with Error Code Finite 2888 State Machine Error, 2889 - deletes all routes associated with this connection, 2890 - resets the connect retry timer (sets to zero) 2891 - releases all BGP resources, 2892 - drops the TCP connection, 2893 - increments the ConnectRetryCnt (connect retry count) 2894 by 1, 2895 - optionally performs peer oscillation damping, and 2896 - changes its state to Idle. 2898 9. UPDATE Message Handling 2900 An UPDATE message may be received only in the Established state. 2901 When an UPDATE message is received, each field is checked for valid- 2902 ity as specified in Section 6.3. 2904 If an optional non-transitive attribute is unrecognized, it is qui- 2905 etly ignored. If an optional transitive attribute is unrecognized, 2906 the Partial bit (the third high-order bit) in the attribute flags 2907 octet is set to 1, and the attribute is retained for propagation to 2908 other BGP speakers. 2910 If an optional attribute is recognized, and has a valid value, then, 2911 depending on the type of the optional attribute, it is processed 2912 locally, retained, and updated, if necessary, for possible propaga- 2913 tion to other BGP speakers. 2915 If the UPDATE message contains a non-empty WITHDRAWN ROUTES field, 2916 RFC DRAFT March 2003 2918 the previously advertised routes whose destinations (expressed as IP 2919 prefixes) contained in this field SHALL be removed from the Adj-RIB- 2920 In. This BGP speaker SHALL run its Decision Process since the previ- 2921 ously advertised route is no longer available for use. 2923 If the UPDATE message contains a feasible route, the Adj-RIB-In will 2924 be updated with this route as follows: if the NLRI of the new route 2925 is identical to the one of the route currently stored in the Adj-RIB- 2926 In, then the new route SHALL replace the older route in the Adj-RIB- 2927 In, thus implicitly withdrawing the older route from service. Other- 2928 wise, if the Adj-RIB-In has no route with NLRI identical to the new 2929 route, the new route SHALL be placed in the Adj-RIB-In. 2931 Once the BGP speaker updates the Adj-RIB-In, the speaker SHALL run 2932 its Decision Process. 2934 9.1 Decision Process 2936 The Decision Process selects routes for subsequent advertisement by 2937 applying the policies in the local Policy Information Base (PIB) to 2938 the routes stored in its Adj-RIBs-In. The output of the Decision Pro- 2939 cess is the set of routes that will be advertised to peers; the 2940 selected routes will be stored in the local speaker's Adj-RIB-Out 2941 according to policy. 2943 The selection process is formalized by defining a function that takes 2944 the attribute of a given route as an argument and returns either (a) 2945 a non-negative integer denoting the degree of preference for the 2946 route, or (b) a value denoting that this route is ineligible to be 2947 installed in LocRib and will be excluded from the next phase of route 2948 selection. 2950 The function that calculates the degree of preference for a given 2951 route SHALL NOT use as its inputs any of the following: the existence 2952 of other routes, the non-existence of other routes, or the path 2953 attributes of other routes. Route selection then consists of individ- 2954 ual application of the degree of preference function to each feasible 2955 route, followed by the choice of the one with the highest degree of 2956 preference. 2958 The Decision Process operates on routes contained in the Adj-RIB-In, 2959 and is responsible for: 2961 - selection of routes to be used locally by the speaker 2963 - selection of routes to be advertised to other BGP peers 2964 RFC DRAFT March 2003 2966 - route aggregation and route information reduction 2968 The Decision Process takes place in three distinct phases, each trig- 2969 gered by a different event: 2971 a) Phase 1 is responsible for calculating the degree of preference 2972 for each route received from a peer. 2974 b) Phase 2 is invoked on completion of phase 1. It is responsible 2975 for choosing the best route out of all those available for each 2976 distinct destination, and for installing each chosen route into 2977 the Loc-RIB. 2979 c) Phase 3 is invoked after the Loc-RIB has been modified. It is 2980 responsible for disseminating routes in the Loc-RIB to each peer, 2981 according to the policies contained in the PIB. Route aggregation 2982 and information reduction can optionally be performed within this 2983 phase. 2985 9.1.1 Phase 1: Calculation of Degree of Preference 2987 The Phase 1 decision function is invoked whenever the local BGP 2988 speaker receives from a peer an UPDATE message that advertises a new 2989 route, a replacement route, or withdrawn routes. 2991 The Phase 1 decision function is a separate process which completes 2992 when it has no further work to do. 2994 The Phase 1 decision function locks an Adj-RIB-In prior to operating 2995 on any route contained within it, and unlocks it after operating on 2996 all new or unfeasible routes contained within it. 2998 For each newly received or replacement feasible route, the local BGP 2999 speaker determines a degree of preference as follows: 3001 If the route is learned from an internal peer, either the value of 3002 the LOCAL_PREF attribute is taken as the degree of preference, or 3003 the local system computes the degree of preference of the route 3004 based on preconfigured policy information. Note that the latter 3005 (computing the degree of preference based on preconfigured policy 3006 information) may result in formation of persistent routing loops. 3008 If the route is learned from an external peer, then the local BGP 3009 speaker computes the degree of preference based on preconfigured 3010 policy information. If the return value indicates that the route 3011 is ineligible, the route MAY NOT serve as an input to the next 3012 RFC DRAFT March 2003 3014 phase of route selection; otherwise the return value is used as 3015 the LOCAL_PREF value in any IBGP readvertisement. 3017 The exact nature of this policy information and the computation 3018 involved is a local matter. 3020 9.1.2 Phase 2: Route Selection 3022 The Phase 2 decision function is invoked on completion of Phase 1. 3023 The Phase 2 function is a separate process which completes when it 3024 has no further work to do. The Phase 2 process considers all routes 3025 that are eligible in the Adj-RIBs-In. 3027 The Phase 2 decision function is blocked from running while the Phase 3028 3 decision function is in process. The Phase 2 function locks all 3029 Adj-RIBs-In prior to commencing its function, and unlocks them on 3030 completion. 3032 If the NEXT_HOP attribute of a BGP route depicts an address that is 3033 not resolvable, or it would become unresolvable if the route was 3034 installed in the routing table the BGP route MUST be excluded from 3035 the Phase 2 decision function. 3037 If the AS_PATH attribute of a BGP route contains an AS loop, the BGP 3038 route should be excluded from the Phase 2 decision function. AS loop 3039 detection is done by scanning the full AS path (as specified in the 3040 AS_PATH attribute), and checking that the autonomous system number of 3041 the local system does not appear in the AS path. Operations of a BGP 3042 speaker that is configured to accept routes with its own autonomous 3043 system number in the AS path are outside the scope of this document. 3045 It is critical that BGP speakers within an AS do not make conflicting 3046 decisions regarding route selection that would cause forwarding loops 3047 to occur. 3049 For each set of destinations for which a feasible route exists in the 3050 Adj-RIBs-In, the local BGP speaker identifies the route that has: 3052 a) the highest degree of preference of any route to the same set 3053 of destinations, or 3055 b) is the only route to that destination, or 3057 c) is selected as a result of the Phase 2 tie breaking rules spec- 3058 ified in 9.1.2.2. 3060 RFC DRAFT March 2003 3062 The local speaker SHALL then install that route in the Loc-RIB, 3063 replacing any route to the same destination that is currently being 3064 held in the Loc-RIB. When the new BGP route is installed in the Rout- 3065 ing Table, care must be taken to ensure that existing routes to the 3066 same destination that are now considered invalid are removed from the 3067 Routing Table. Whether or not the new BGP route replaces an existing 3068 non-BGP route in the Routing Table depends on the policy configured 3069 on the BGP speaker. 3071 The local speaker MUST determine the immediate next-hop address from 3072 the NEXT_HOP attribute of the selected route (see Section 5.1.3). If 3073 either the immediate next hop or the IGP cost to the NEXT_HOP (where 3074 the NEXT_HOP is resolved through an IGP route) changes, Phase 2 Route 3075 Selection MUST be performed again. 3077 Notice that even though BGP routes do not have to be installed in the 3078 Routing Table with the immediate next hop(s), implementations MUST 3079 take care that before any packets are forwarded along a BGP route, 3080 its associated NEXT_HOP address is resolved to the immediate 3081 (directly connected) next-hop address and this address (or multiple 3082 addresses) is finally used for actual packet forwarding. 3084 Unresolvable routes SHALL be removed from the Loc-RIB and the routing 3085 table. However, corresponding unresolvable routes SHOULD be kept in 3086 the Adj-RIBs-In (in case they become resolvable). 3088 9.1.2.1 Route Resolvability Condition 3090 As indicated in Section 9.1.2, BGP speakers SHOULD exclude unresolv- 3091 able routes from the Phase 2 decision. This ensures that only valid 3092 routes are installed in Loc-RIB and the Routing Table. 3094 The route resolvability condition is defined as follows. 3096 1. A route Rte1, referencing only the intermediate network 3097 address, is considered resolvable if the Routing Table contains at 3098 least one resolvable route Rte2 that matches Rte1's intermediate 3099 network address and is not recursively resolved (directly or indi- 3100 rectly) through Rte1. If multiple matching routes are available, 3101 only the longest matching route SHOULD be considered. 3103 2. Routes referencing interfaces (with or without intermediate 3104 addresses) are considered resolvable if the state of the refer- 3105 enced interface is up and IP processing is enabled on this inter- 3106 face. 3108 RFC DRAFT March 2003 3110 BGP routes do not refer to interfaces, but can be resolved through 3111 the routes in the Routing Table that can be of both types (those that 3112 specify interfaces or those that do not). IGP routes and routes to 3113 directly connected networks are expected to specify the outbound 3114 interface. Static routes can specify the outbound interface, or the 3115 intermediate address, or both. 3117 Note that a BGP route is considered unresolvable not only in situa- 3118 tions where the BGP speaker's Routing Table contains no route match- 3119 ing the BGP route's NEXT_HOP. Mutually recursive routes (routes 3120 resolving each other or themselves), also fail the resolvability 3121 check. 3123 It is also important that implementations do not consider feasible 3124 routes that would become unresolvable if they were installed in the 3125 Routing Table even if their NEXT_HOPs are resolvable using the cur- 3126 rent contents of the Routing Table (an example of such routes would 3127 be mutually recursive routes). This check ensures that a BGP speaker 3128 does not install in the Routing Table routes that will be removed and 3129 not used by the speaker. Therefore, in addition to local Routing 3130 Table stability, this check also improves behavior of the protocol in 3131 the network. 3133 Whenever a BGP speaker identifies a route that fails the resolvabil- 3134 ity check because of mutual recursion, an error message SHOULD be 3135 logged. 3137 9.1.2.2 Breaking Ties (Phase 2) 3139 In its Adj-RIBs-In a BGP speaker may have several routes to the same 3140 destination that have the same degree of preference. The local 3141 speaker can select only one of these routes for inclusion in the 3142 associated Loc-RIB. The local speaker considers all routes with the 3143 same degrees of preference, both those received from internal peers, 3144 and those received from external peers. 3146 The following tie-breaking procedure assumes that for each candidate 3147 route all the BGP speakers within an autonomous system can ascertain 3148 the cost of a path (interior distance) to the address depicted by the 3149 NEXT_HOP attribute of the route, and follow the same route selection 3150 algorithm. 3152 The tie-breaking algorithm begins by considering all equally prefer- 3153 able routes to the same destination, and then selects routes to be 3154 removed from consideration. The algorithm terminates as soon as only 3155 one route remains in consideration. The criteria MUST be applied in 3156 RFC DRAFT March 2003 3158 the order specified. 3160 Several of the criteria are described using pseudo-code. Note that 3161 the pseudo-code shown was chosen for clarity, not efficiency. It is 3162 not intended to specify any particular implementation. BGP implemen- 3163 tations MAY use any algorithm which produces the same results as 3164 those described here. 3166 a) Remove from consideration all routes which are not tied for 3167 having the smallest number of AS numbers present in their AS_PATH 3168 attributes. Note, that when counting this number, an AS_SET counts 3169 as 1, no matter how many ASs are in the set. 3171 b) Remove from consideration all routes which are not tied for 3172 having the lowest Origin number in their Origin attribute. 3174 c) Remove from consideration routes with less-preferred 3175 MULTI_EXIT_DISC attributes. MULTI_EXIT_DISC is only comparable 3176 between routes learned from the same neighboring AS (the neighbor- 3177 ing AS is determined from the AS_PATH attribute). Routes which do 3178 not have the MULTI_EXIT_DISC attribute are considered to have the 3179 lowest possible MULTI_EXIT_DISC value. 3181 This is also described in the following procedure: 3183 for m = all routes still under consideration 3184 for n = all routes still under consideration 3185 if (neighborAS(m) == neighborAS(n)) and (MED(n) < MED(m)) 3186 remove route m from consideration 3188 In the pseudo-code above, MED(n) is a function which returns the 3189 value of route n's MULTI_EXIT_DISC attribute. If route n has no 3190 MULTI_EXIT_DISC attribute, the function returns the lowest possi- 3191 ble MULTI_EXIT_DISC value, i.e. 0. 3193 Similarly, neighborAS(n) is a function which returns the neighbor 3194 AS from which the route was received. If the route is learned via 3195 IBGP, and the other IBGP speaker didn't originate the route, it is 3196 the neighbor AS from which the other IBGP speaker learned the 3197 route. If the route is learned via IBGP, and the other IBGP 3198 speaker originated the route, it is the local AS. 3200 If a MULTI_EXIT_DISC attribute is removed before re-advertising a 3201 route into IBGP, then comparison based on the received EBGP 3202 MULTI_EXIT_DISC attribute MAY still be performed. If an implemen- 3203 tation chooses to remove MULTI_EXIT_DISC, then the optional com- 3204 parison on MULTI_EXIT_DISC if performed at all MUST be performed 3205 only among EBGP learned routes. The best EBGP learned route may 3206 RFC DRAFT March 2003 3208 then be compared with IBGP learned routes after the removal of the 3209 MULTI_EXIT_DISC attribute. If MULTI_EXIT_DISC is removed from a 3210 subset of EBGP learned routes and the selected "best" EBGP learned 3211 route will not have MULTI_EXIT_DISC removed, then the 3212 MULTI_EXIT_DISC must be used in the comparison with IBGP learned 3213 routes. For IBGP learned routes the MULTI_EXIT_DISC MUST be used 3214 in route comparisons which reach this step in the decision pro- 3215 cess. Including the MULTI_EXIT_DISC of an EBGP learned route in 3216 the comparison with an IBGP learned route, then removing the 3217 MULTI_EXIT_DISC atribute and advertising the route has been proven 3218 to cause route loops. 3220 d) If at least one of the candidate routes was received via EBGP, 3221 remove from consideration all routes which were received via IBGP. 3223 e) Remove from consideration any routes with less-preferred inte- 3224 rior cost. The interior cost of a route is determined by calcu- 3225 lating the metric to the NEXT_HOP for the route using the Routing 3226 Table. If the NEXT_HOP hop for a route is reachable, but no cost 3227 can be determined, then this step should be skipped (equivalently, 3228 consider all routes to have equal costs). 3230 This is also described in the following procedure. 3232 for m = all routes still under consideration 3233 for n = all routes in still under consideration 3234 if (cost(n) is lower than cost(m)) 3235 remove m from consideration 3237 In the pseudo-code above, cost(n) is a function which returns the 3238 cost of the path (interior distance) to the address given in the 3239 NEXT_HOP attribute of the route. 3241 f) Remove from consideration all routes other than the route that 3242 was advertised by the BGP speaker whose BGP Identifier has the 3243 lowest value. 3245 g) Prefer the route received from the lowest peer address. 3247 9.1.3 Phase 3: Route Dissemination 3249 The Phase 3 decision function is invoked on completion of Phase 2, or 3250 when any of the following events occur: 3252 a) when routes in the Loc-RIB to local destinations have changed 3253 RFC DRAFT March 2003 3255 b) when locally generated routes learned by means outside of BGP 3256 have changed 3258 c) when a new BGP speaker - BGP speaker connection has been estab- 3259 lished 3261 The Phase 3 function is a separate process which completes when it 3262 has no further work to do. The Phase 3 Routing Decision function is 3263 blocked from running while the Phase 2 decision function is in pro- 3264 cess. 3266 All routes in the Loc-RIB are processed into Adj-RIBs-Out according 3267 to configured policy. This policy MAY exclude a route in the Loc-RIB 3268 from being installed in a particular Adj-RIB-Out. A route SHALL NOT 3269 be installed in the Adj-Rib-Out unless the destination and NEXT_HOP 3270 described by this route may be forwarded appropriately by the Routing 3271 Table. If a route in Loc-RIB is excluded from a particular Adj-RIB- 3272 Out the previously advertised route in that Adj-RIB-Out MUST be with- 3273 drawn from service by means of an UPDATE message (see 9.2). 3275 Route aggregation and information reduction techniques (see 9.2.2.1) 3276 may optionally be applied. 3278 Any local policy which results in routes being added to an Adj-RIB- 3279 Out without also being added to the local BGP speaker's forwarding 3280 table, is outside the scope of this document. 3282 When the updating of the Adj-RIBs-Out and the Routing Table is com- 3283 plete, the local BGP speaker runs the Update-Send process of 9.2. 3285 9.1.4 Overlapping Routes 3287 A BGP speaker may transmit routes with overlapping Network Layer 3288 Reachability Information (NLRI) to another BGP speaker. NLRI overlap 3289 occurs when a set of destinations are identified in non-matching mul- 3290 tiple routes. Since BGP encodes NLRI using IP prefixes, overlap will 3291 always exhibit subset relationships. A route describing a smaller 3292 set of destinations (a longer prefix) is said to be more specific 3293 than a route describing a larger set of destinations (a shorter pre- 3294 fix); similarly, a route describing a larger set of destinations is 3295 said to be less specific than a route describing a smaller set of 3296 destinations. 3298 The precedence relationship effectively decomposes less specific 3299 routes into two parts: 3301 RFC DRAFT March 2003 3303 - a set of destinations described only by the less specific route, 3304 and 3306 - a set of destinations described by the overlap of the less spe- 3307 cific and the more specific routes 3309 When overlapping routes are present in the same Adj-RIB-In, the more 3310 specific route takes precedence, in order from more specific to least 3311 specific. 3313 The set of destinations described by the overlap represents a portion 3314 of the less specific route that is feasible, but is not currently in 3315 use. If a more specific route is later withdrawn, the set of desti- 3316 nations described by the overlap will still be reachable using the 3317 less specific route. 3319 If a BGP speaker receives overlapping routes, the Decision Process 3320 MUST consider both routes based on the configured acceptance policy. 3321 If both a less and a more specific route are accepted, then the Deci- 3322 sion Process MUST either install both the less and the more specific 3323 routes or it MUST aggregate the two routes and install the aggregated 3324 route, provided that both routes have the same value of the NEXT_HOP 3325 attribute. 3327 If a BGP speaker chooses to aggregate, then it SHOULD either include 3328 all AS used to form the aggreagate in an AS_SET or add the 3329 ATOMIC_AGGREGATE attribute to the route. This attribute is now pri- 3330 marily informational. With the elimination of IP routing protocols 3331 that do not support classless routing and the elimination of router 3332 and host implementations that do not support classless routing, there 3333 is no longer a need to deaggregate. Routes SHOULD NOT be de-aggre- 3334 gated. A route that carries ATOMIC_AGGREGATE attribute in particular 3335 MUST NOT be de-aggregated. That is, the NLRI of this route can not be 3336 made more specific. Forwarding along such a route does not guarantee 3337 that IP packets will actually traverse only ASs listed in the AS_PATH 3338 attribute of the route. 3340 9.2 Update-Send Process 3342 The Update-Send process is responsible for advertising UPDATE mes- 3343 sages to all peers. For example, it distributes the routes chosen by 3344 the Decision Process to other BGP speakers which may be located in 3345 either the same autonomous system or a neighboring autonomous system. 3347 When a BGP speaker receives an UPDATE message from an internal peer, 3348 RFC DRAFT March 2003 3350 the receiving BGP speaker SHALL NOT re-distribute the routing infor- 3351 mation contained in that UPDATE message to other internal peers, 3352 unless the speaker acts as a BGP Route Reflector [RFC2796]. 3354 As part of Phase 3 of the route selection process, the BGP speaker 3355 has updated its Adj-RIBs-Out. All newly installed routes and all 3356 newly unfeasible routes for which there is no replacement route SHALL 3357 be advertised to its peers by means of an UPDATE message. 3359 A BGP speaker SHOULT NOT advertise a given feasible BGP route from 3360 its Adj-RIB-Out if it would produce an UPDATE message containing the 3361 same BGP route as was previously advertised. 3363 Any routes in the Loc-RIB marked as unfeasible SHALL be removed. 3364 Changes to the reachable destinations within its own autonomous sys- 3365 tem SHALL also be advertised in an UPDATE message. 3367 If due to the limits on the maximum size of an UPDATE message (see 3368 Section 4) a single route doesn't fit into the message, the BGP 3369 speaker MUST not advertise the route to its peers and MAY choose to 3370 log an error locally. 3372 9.2.1 Controlling Routing Traffic Overhead 3374 The BGP protocol constrains the amount of routing traffic (that is, 3375 UPDATE messages) in order to limit both the link bandwidth needed to 3376 advertise UPDATE messages and the processing power needed by the 3377 Decision Process to digest the information contained in the UPDATE 3378 messages. 3380 9.2.1.1 Frequency of Route Advertisement 3382 The parameter MinRouteAdvertisementInterval determines the minimum 3383 amount of time that must elapse between advertisement and/or with- 3384 drawal of routes to a particular destination by a BGP speaker to a 3385 peer. This rate limiting procedure applies on a per-destination 3386 basis, although the value of MinRouteAdvertisementInterval is set on 3387 a per BGP peer basis. 3389 Two UPDATE messages sent by a BGP speaker to a peer that advertise 3390 feasible routes and/or withdrawal of unfeasible routes to some common 3391 set of destinations MUST be separated by at least 3392 RFC DRAFT March 2003 3394 MinRouteAdvertisementInterval. Clearly, this can only be achieved 3395 precisely by keeping a separate timer for each common set of destina- 3396 tions. This would be unwarranted overhead. Any technique which 3397 ensures that the interval between two UPDATE messages sent from a BGP 3398 speaker to a peer that advertise feasible routes and/or withdrawal of 3399 unfeasible routes to some common set of destinations will be at least 3400 MinRouteAdvertisementInterval, and will also ensure a constant upper 3401 bound on the interval is acceptable. 3403 Since fast convergence is needed within an autonomous system, either 3404 (a) the MinRouteAdvertisementInterval used for internal peers SHOULD 3405 be shorter than the MinRouteAdvertisementInterval used for external 3406 peers, or (b) the procedure describe in this section SHOULD NOT apply 3407 for routes sent to internal peers. 3409 This procedure does not limit the rate of route selection, but only 3410 the rate of route advertisement. If new routes are selected multiple 3411 times while awaiting the expiration of MinRouteAdvertisementInterval, 3412 the last route selected SHALL be advertised at the end of MinRouteAd- 3413 vertisementInterval. 3415 9.2.1.2 Frequency of Route Origination 3417 The parameter MinASOriginationInterval determines the minimum amount 3418 of time that must elapse between successive advertisements of UPDATE 3419 messages that report changes within the advertising BGP speaker's own 3420 autonomous systems. 3422 9.2.2 Efficient Organization of Routing Information 3424 Having selected the routing information which it will advertise, a 3425 BGP speaker may avail itself of several methods to organize this 3426 information in an efficient manner. 3428 9.2.2.1 Information Reduction 3430 Information reduction may imply a reduction in granularity of policy 3431 control - after information is collapsed, the same policies will 3432 apply to all destinations and paths in the equivalence class. 3434 The Decision Process may optionally reduce the amount of information 3435 that it will place in the Adj-RIBs-Out by any of the following 3436 RFC DRAFT March 2003 3438 methods: 3440 a) Network Layer Reachability Information (NLRI): 3442 Destination IP addresses can be represented as IP address pre- 3443 fixes. In cases where there is a correspondence between the 3444 address structure and the systems under control of an autonomous 3445 system administrator, it will be possible to reduce the size of 3446 the NLRI carried in the UPDATE messages. 3448 b) AS_PATHs: 3450 AS path information can be represented as ordered AS_SEQUENCEs or 3451 unordered AS_SETs. AS_SETs are used in the route aggregation algo- 3452 rithm described in 9.2.2.2. They reduce the size of the AS_PATH 3453 information by listing each AS number only once, regardless of how 3454 many times it may have appeared in multiple AS_PATHs that were 3455 aggregated. 3457 An AS_SET implies that the destinations listed in the NLRI can be 3458 reached through paths that traverse at least some of the con- 3459 stituent autonomous systems. AS_SETs provide sufficient informa- 3460 tion to avoid routing information looping; however their use may 3461 prune potentially feasible paths, since such paths are no longer 3462 listed individually as in the form of AS_SEQUENCEs. In practice 3463 this is not likely to be a problem, since once an IP packet 3464 arrives at the edge of a group of autonomous systems, the BGP 3465 speaker at that point is likely to have more detailed path infor- 3466 mation and can distinguish individual paths to destinations. 3468 9.2.2.2 Aggregating Routing Information 3470 Aggregation is the process of combining the characteristics of sev- 3471 eral different routes in such a way that a single route can be adver- 3472 tised. Aggregation can occur as part of the decision process to 3473 reduce the amount of routing information that will be placed in the 3474 Adj-RIBs-Out. 3476 Aggregation reduces the amount of information that a BGP speaker must 3477 store and exchange with other BGP speakers. Routes can be aggregated 3478 by applying the following procedure separately to path attributes of 3479 like type and to the Network Layer Reachability Information. 3481 Routes that have different MULTI_EXIT_DISC attribute SHALL NOT be 3482 aggregated. 3484 RFC DRAFT March 2003 3486 Path attributes that have different type codes can not be aggregated 3487 together. Path attributes of the same type code may be aggregated, 3488 according to the following rules: 3490 NEXT_HOP: 3491 When aggregating routes that have different NEXT_HOP attribute, 3492 the NEXT_HOP attribute of the aggregated route SHALL identify 3493 an interface on the BGP speaker that performs the aggregation. 3495 ORIGIN attribute: 3496 If at least one route among routes that are aggregated has ORI- 3497 GIN with the value INCOMPLETE, then the aggregated route MUST 3498 have the ORIGIN attribute with the value INCOMPLETE. Other- 3499 wise, if at least one route among routes that are aggregated 3500 has ORIGIN with the value EGP, then the aggregated route MUST 3501 have the origin attribute with the value EGP. In all other case 3502 the value of the ORIGIN attribute of the aggregated route is 3503 IGP. 3505 AS_PATH attribute: 3506 If routes to be aggregated have identical AS_PATH attributes, 3507 then the aggregated route has the same AS_PATH attribute as 3508 each individual route. 3510 For the purpose of aggregating AS_PATH attributes we model each 3511 AS within the AS_PATH attribute as a tuple , where 3512 "type" identifies a type of the path segment the AS belongs to 3513 (e.g. AS_SEQUENCE, AS_SET), and "value" is the AS number. If 3514 the routes to be aggregated have different AS_PATH attributes, 3515 then the aggregated AS_PATH attribute SHALL satisfy all of the 3516 following conditions: 3518 - all tuples of type AS_SEQUENCE in the aggregated AS_PATH 3519 SHALL appear in all of the AS_PATH in the initial set of 3520 routes to be aggregated. 3522 - all tuples of type AS_SET in the aggregated AS_PATH SHALL 3523 appear in at least one of the AS_PATH in the initial set 3524 (they may appear as either AS_SET or AS_SEQUENCE types). 3526 - for any tuple X of type AS_SEQUENCE in the aggregated 3527 AS_PATH which precedes tuple Y in the aggregated AS_PATH, X 3528 precedes Y in each AS_PATH in the initial set which contains 3529 Y, regardless of the type of Y. 3531 - No tuple of type AS_SET with the same value SHALL appear 3532 more than once in the aggregated AS_PATH. 3534 RFC DRAFT March 2003 3536 - Multiple tuples of type AS_SEQUENCE with the same value 3537 may appear in the aggregated AS_PATH only when adjacent to 3538 another tuple of the same type and value. 3540 An implementation may choose any algorithm which conforms to 3541 these rules. At a minimum a conformant implementation SHALL be 3542 able to perform the following algorithm that meets all of the 3543 above conditions: 3545 - determine the longest leading sequence of tuples (as 3546 defined above) common to all the AS_PATH attributes of the 3547 routes to be aggregated. Make this sequence the leading 3548 sequence of the aggregated AS_PATH attribute. 3550 - set the type of the rest of the tuples from the AS_PATH 3551 attributes of the routes to be aggregated to AS_SET, and 3552 append them to the aggregated AS_PATH attribute. 3554 - if the aggregated AS_PATH has more than one tuple with the 3555 same value (regardless of tuple's type), eliminate all, but 3556 one such tuple by deleting tuples of the type AS_SET from 3557 the aggregated AS_PATH attribute. 3559 - for each pair of adjacent tuples in the aggregated 3560 AS_PATH, if both tuples have the same type, merge them 3561 together, as long as doing so will not cause a segment with 3562 length greater than 255 to be generated. 3564 Appendix F, Section F.6 presents another algorithm that satis- 3565 fies the conditions and allows for more complex policy configu- 3566 rations. 3568 ATOMIC_AGGREGATE: 3569 If at least one of the routes to be aggregated has 3570 ATOMIC_AGGREGATE path attribute, then the aggregated route 3571 SHALL have this attribute as well. 3573 AGGREGATOR: 3574 Any AGGREGATOR attributes from the routes to be aggregated MUST 3575 NOT be included in the aggregated route. The BGP speaker per- 3576 forming the route aggregation MAY attach a new AGGREGATOR 3577 attribute (see Section 5.1.7). 3579 9.3 Route Selection Criteria 3581 Generally speaking, additional rules for comparing routes among 3582 RFC DRAFT March 2003 3584 several alternatives are outside the scope of this document. There 3585 are two exceptions: 3587 - If the local AS appears in the AS path of the new route being 3588 considered, then that new route can not be viewed as better than 3589 any other route (provided that the speaker is configured to accept 3590 such routes). If such a route were ever used, a routing loop could 3591 result. 3593 - In order to achieve successful distributed operation, only 3594 routes with a likelihood of stability can be chosen. Thus, an AS 3595 SHOULD avoid using unstable routes, and it SHOULD NOT make rapid 3596 spontaneous changes to its choice of route. Quantifying the terms 3597 "unstable" and "rapid" in the previous sentence will require expe- 3598 rience, but the principle is clear. 3600 Care must be taken to ensure that BGP speakers in the same AS do not 3601 make inconsistent decisions. 3603 9.4 Originating BGP routes 3605 A BGP speaker may originate BGP routes by injecting routing informa- 3606 tion acquired by some other means (e.g. via an IGP) into BGP. A BGP 3607 speaker that originates BGP routes assigns the degree of preference 3608 to these routes by passing them through the Decision Process (see 3609 Section 9.1). These routes MAY also be distributed to other BGP 3610 speakers within the local AS as part of the update process (see Sec- 3611 tion 9.2). The decision whether to distribute non-BGP acquired routes 3612 within an AS via BGP or not depends on the environment within the AS 3613 (e.g. type of IGP) and SHOULD be controlled via configuration. 3615 10 BGP Timers 3617 BGP employs five timers: ConnectRetry (see Section 8), Hold Time (see 3618 Section 4.2), KeepAlive (see Section 8), MinASOriginationInterval 3619 (see Section 9.2.1.2), and MinRouteAdvertisementInterval (see Section 3620 9.2.1.1). 3622 The suggested default value for the ConnectRetry timer is 120 sec- 3623 onds. 3625 The suggested default value for the Hold Time is 90 seconds. 3627 The suggested default value for the KeepAlive timer is 1/3 of the 3628 Hold Time. 3630 RFC DRAFT March 2003 3632 The suggested default value for the MinASOriginationInterval is 15 3633 seconds. 3635 The suggested default value for the MinRouteAdvertisementInterval is 3636 30 seconds. 3638 An implementation of BGP MUST allow the Hold Time timer to be config- 3639 urable on a per peer basis, and MAY allow the other timers to be con- 3640 figurable. 3642 To minimize the likelihood that the distribution of BGP messages by a 3643 given BGP speaker will contain peaks, jitter SHOULD be applied to the 3644 timers associated with MinASOriginationInterval, KeepAlive, Min- 3645 RouteAdvertisementInterval, and ConnectRetry. A given BGP speaker MAY 3646 apply the same jitter to each of these quantities regardless of the 3647 destinations to which the updates are being sent; that is, jitter 3648 need not be configured on a "per peer" basis. 3650 The suggested default amount of jitter SHALL be determined by multi- 3651 plying the base value of the appropriate timer by a random factor 3652 which is uniformly distributed in the range from 0.75 to 1.0. A new 3653 random value SHOULD be picked each time the timer is set. The range 3654 of the jitter random value MAY be configurable. 3656 Appendix A. Comparison with RFC1771 3658 There are numerous editorial changes (too many to list here). 3660 The following list the technical changes: 3662 Changes to reflect the usages of such features as TCP MD5 3663 [RFC2385], BGP Route Reflectors [RFC2796], BGP Confederations 3664 [RFC3065], and BGP Route Refresh [RFC2918]. 3666 Clarification on the use of the BGP Identifier in the AGGREGATOR 3667 attribute. 3669 Procedures for imposing an upper bound on the number of prefixes 3670 that a BGP speaker would accept from a peer. 3672 The ability of a BGP speaker to include more than one instance of 3673 its own AS in the AS_PATH attribute for the purpose of inter-AS 3674 traffic engineering. 3676 Clarifications on the various types of NEXT_HOPs. 3678 RFC DRAFT March 2003 3680 Clarifications to the use of the ATOMIC_AGGREGATE attribute. 3682 The relationship between the immediate next hop, and the next hop 3683 as specified in the NEXT_HOP path attribute. 3685 Clarifications on the tie-breaking procedures. 3687 Clarifications on the frequency of route advertisements. 3689 Optional Parameter Type 1 (Authentication Information) has been 3690 deprecated. 3692 UPDATE Message Error subcode 7 (AS Routing Loop) has been depre- 3693 cated. 3695 OPEN Message Error subcode 5 (Authentication Failure) has been 3696 deprecated. 3698 Use of the Marker field for authentication has been deprecated. 3700 Use of TCP MD5 [RFC2385] for authentication is mandatory. 3702 Appendix B. Comparison with RFC1267 3704 All the changes listed in Appendix A, plus the following. 3706 BGP-4 is capable of operating in an environment where a set of reach- 3707 able destinations may be expressed via a single IP prefix. The con- 3708 cept of network classes, or subnetting is foreign to BGP-4. To 3709 accommodate these capabilities BGP-4 changes semantics and encoding 3710 associated with the AS_PATH attribute. New text has been added to 3711 define semantics associated with IP prefixes. These abilities allow 3712 BGP-4 to support the proposed supernetting scheme [9]. 3714 To simplify configuration this version introduces a new attribute, 3715 LOCAL_PREF, that facilitates route selection procedures. 3717 The INTER_AS_METRIC attribute has been renamed to be MULTI_EXIT_DISC. 3718 A new attribute, ATOMIC_AGGREGATE, has been introduced to insure that 3719 certain aggregates are not de-aggregated. Another new attribute, 3720 AGGREGATOR, can be added to aggregate routes in order to advertise 3721 which AS and which BGP speaker within that AS caused the aggregation. 3723 To insure that Hold Timers are symmetric, the Hold Time is now nego- 3724 tiated on a per-connection basis. Hold Times of zero are now sup- 3725 ported. 3727 RFC DRAFT March 2003 3729 Appendix C. Comparison with RFC 1163 3731 All of the changes listed in Appendices A and B, plus the following. 3733 To detect and recover from BGP connection collision, a new field (BGP 3734 Identifier) has been added to the OPEN message. New text (Section 3735 6.8) has been added to specify the procedure for detecting and recov- 3736 ering from collision. 3738 The new document no longer restricts the router that is passed in the 3739 NEXT_HOP path attribute to be part of the same Autonomous System as 3740 the BGP Speaker. 3742 New document optimizes and simplifies the exchange of the information 3743 about previously reachable routes. 3745 Appendix D. Comparison with RFC 1105 3747 All of the changes listed in Appendices A, B and C, plus the follow- 3748 ing. 3750 Minor changes to the RFC1105 Finite State Machine were necessary to 3751 accommodate the TCP user interface provided by 4.3 BSD. 3753 The notion of Up/Down/Horizontal relations present in RFC1105 has 3754 been removed from the protocol. 3756 The changes in the message format from RFC1105 are as follows: 3758 1. The Hold Time field has been removed from the BGP header and 3759 added to the OPEN message. 3761 2. The version field has been removed from the BGP header and 3762 added to the OPEN message. 3764 3. The Link Type field has been removed from the OPEN message. 3766 4. The OPEN CONFIRM message has been eliminated and replaced with 3767 implicit confirmation provided by the KEEPALIVE message. 3769 5. The format of the UPDATE message has been changed signifi- 3770 cantly. New fields were added to the UPDATE message to support 3771 multiple path attributes. 3773 6. The Marker field has been expanded and its role broadened to 3774 RFC DRAFT March 2003 3776 support authentication. 3778 Note that quite often BGP, as specified in RFC 1105, is referred 3779 to as BGP-1, BGP, as specified in RFC 1163, is referred to as 3780 BGP-2, BGP, as specified in RFC1267 is referred to as BGP-3, and 3781 BGP, as specified in this document is referred to as BGP-4. 3783 Appendix E. TCP options that may be used with BGP 3785 If a local system TCP user interface supports TCP PUSH function, then 3786 each BGP message SHOULD be transmitted with PUSH flag set. Setting 3787 PUSH flag forces BGP messages to be transmitted promptly to the 3788 receiver. 3790 If a local system TCP user interface supports setting of the DSCP 3791 field [RFC2474] for TCP connections, then the TCP connection used by 3792 BGP SHOULD be opened with bits 0-2 of the DSCP field set to 110 3793 (binary). 3795 Appendix F. Implementation Recommendations 3797 This section presents some implementation recommendations. 3799 Appendix F.1 Multiple Networks Per Message 3801 The BGP protocol allows for multiple address prefixes with the same 3802 path attributes to be specified in one message. Making use of this 3803 capability is highly recommended. With one address prefix per message 3804 there is a substantial increase in overhead in the receiver. Not only 3805 does the system overhead increase due to the reception of multiple 3806 messages, but the overhead of scanning the routing table for updates 3807 to BGP peers and other routing protocols (and sending the associated 3808 messages) is incurred multiple times as well. 3810 One method of building messages containing many address prefixes per 3811 a path attribute set from a routing table that is not organized on a 3812 per path attribute set basis is to build many messages as the routing 3813 table is scanned. As each address prefix is processed, a message for 3814 the associated set of path attributes is allocated, if it does not 3815 exist, and the new address prefix is added to it. If such a message 3816 exists, the new address prefix is just appended to it. If the message 3817 lacks the space to hold the new address prefix, it is transmitted, a 3818 RFC DRAFT March 2003 3820 new message is allocated, and the new address prefix is inserted into 3821 the new message. When the entire routing table has been scanned, all 3822 allocated messages are sent and their resources released. Maximum 3823 compression is achieved when all the destinations covered by the 3824 address prefixes share a common set of path attributes making it pos- 3825 sible to send many address prefixes in one 4096-byte message. 3827 When peering with a BGP implementation that does not compress multi- 3828 ple address prefixes into one message, it may be necessary to take 3829 steps to reduce the overhead from the flood of data received when a 3830 peer is acquired or a significant network topology change occurs. One 3831 method of doing this is to limit the rate of updates. This will 3832 eliminate the redundant scanning of the routing table to provide 3833 flash updates for BGP peers and other routing protocols. A disadvan- 3834 tage of this approach is that it increases the propagation latency of 3835 routing information. By choosing a minimum flash update interval 3836 that is not much greater than the time it takes to process the multi- 3837 ple messages this latency should be minimized. A better method would 3838 be to read all received messages before sending updates. 3840 Appendix F.2 Reducing route flapping 3842 To avoid excessive route flapping a BGP speaker which needs to with- 3843 draw a destination and send an update about a more specific or less 3844 specific route SHOULD combine them into the same UPDATE message. 3846 Appendix F.3 Path attribute ordering 3848 Implementations which combine update messages as described above in 3849 6.1 may prefer to see all path attributes presented in a known order. 3850 This permits them to quickly identify sets of attributes from differ- 3851 ent update messages which are semantically identical. To facilitate 3852 this, it is a useful optimization to order the path attributes 3853 according to type code. This optimization is entirely optional. 3855 Appendix F.4 AS_SET sorting 3857 Another useful optimization that can be done to simplify this situa- 3858 tion is to sort the AS numbers found in an AS_SET. This optimization 3859 is entirely optional. 3861 RFC DRAFT March 2003 3863 Appendix F.5 Control over version negotiation 3865 Since BGP-4 is capable of carrying aggregated routes which can not be 3866 properly represented in BGP-3, an implementation which supports BGP-4 3867 and another BGP version should provide the capability to only speak 3868 BGP-4 on a per-peer basis. 3870 Appendix F.6 Complex AS_PATH aggregation 3872 An implementation which chooses to provide a path aggregation algo- 3873 rithm which retains significant amounts of path information may wish 3874 to use the following procedure: 3876 For the purpose of aggregating AS_PATH attributes of two routes, 3877 we model each AS as a tuple , where "type" identifies 3878 a type of the path segment the AS belongs to (e.g. AS_SEQUENCE, 3879 AS_SET), and "value" is the AS number. Two ASs are said to be the 3880 same if their corresponding tuples are the same. 3882 The algorithm to aggregate two AS_PATH attributes works as fol- 3883 lows: 3885 a) Identify the same ASs (as defined above) within each AS_PATH 3886 attribute that are in the same relative order within both 3887 AS_PATH attributes. Two ASs, X and Y, are said to be in the 3888 same order if either: 3889 - X precedes Y in both AS_PATH attributes, or - Y precedes X 3890 in both AS_PATH attributes. 3892 b) The aggregated AS_PATH attribute consists of ASs identified 3893 in (a) in exactly the same order as they appear in the AS_PATH 3894 attributes to be aggregated. If two consecutive ASs identified 3895 in (a) do not immediately follow each other in both of the 3896 AS_PATH attributes to be aggregated, then the intervening ASs 3897 (ASs that are between the two consecutive ASs that are the 3898 same) in both attributes are combined into an AS_SET path seg- 3899 ment that consists of the intervening ASs from both AS_PATH 3900 attributes; this segment is then placed in between the two con- 3901 secutive ASs identified in (a) of the aggregated attribute. If 3902 two consecutive ASs identified in (a) immediately follow each 3903 other in one attribute, but do not follow in another, then the 3904 intervening ASs of the latter are combined into an AS_SET path 3905 segment; this segment is then placed in between the two consec- 3906 utive ASs identified in (a) of the aggregated attribute. 3908 RFC DRAFT March 2003 3910 c) For each pair of adjacent tuples in the aggregated AS_PATH, 3911 if both tuples have the same type, merge them together, as long 3912 as doing so will not cause a segment with length greater than 3913 255 to be generated. 3915 If as a result of the above procedure a given AS number appears 3916 more than once within the aggregated AS_PATH attribute, all, but 3917 the last instance (rightmost occurrence) of that AS number SHOULD 3918 be removed from the aggregated AS_PATH attribute. 3920 Security Considerations 3922 The authentication mechanism that an implementation of BGP MUST sup- 3923 port is specified in [RFC2385]. The authentication provided by this 3924 mechanism could be done on a per peer basis. 3926 Security issues with BGP routing information dissemination are dis- 3927 cussed in [XXX]. 3929 IANA Considerations 3931 All extensions to this protocol, including new message types and Path 3932 Attributes MUST only be made using the Standards Action process 3933 defined in [RFC2434]. 3935 Normative References 3937 [RFC791] Postel, J., "Internet Protocol - DARPA Internet Program Pro- 3938 tocol Specification", RFC791, September 1981. 3940 [RFC793] Postel, J., "Transmission Control Protocol - DARPA Internet 3941 Program Protocol Specification", RFC793, September 1981. 3943 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 3944 Requirement Levels", BCP 14, RFC 2119, March 1997. 3946 [RFC2385] Heffernan, A., "Protection of BGP Sessions via the TCP MD5 3947 Signature Option", RFC2385, August 1998. 3949 [RFC2434] Narten, T., Alvestrand, H., "Guidelines for Writing an IANA 3950 Considerations Section in RFCs", RFC2434, October 1998 3951 RFC DRAFT March 2003 3953 [RFC2474] Nichols, K., et al.,"Definition of the Differentiated Ser- 3954 vices Field (DS Field) in the IPv4 and IPv6 Headers", RFC2474, Decem- 3955 ber 1998 3957 Non-normative References 3959 [RFC904] Mills, D., "Exterior Gateway Protocol Formal Specification", 3960 RFC904, April 1984. 3962 [RFC1092] Rekhter, Y., "EGP and Policy Based Routing in the New 3963 NSFNET Backbone", RFC1092, February 1989. 3965 [RFC1093] Braun, H-W., "The NSFNET Routing Architecture", RFC1093, 3966 February 1989. 3968 [RFC1772] Rekhter, Y., and P. Gross, "Application of the Border Gate- 3969 way Protocol in the Internet", RFC1772, March 1995. 3971 [RFC1518] Rekhter, Y., Li, T., "An Architecture for IP Address Allo- 3972 cation with CIDR", RFC 1518, September 1993. 3974 [RFC1519] Fuller, V., Li, T., Yu, J., and Varadhan, K., ""Classless 3975 Inter-Domain Routing (CIDR): an Address Assignment and Aggregation 3976 Strategy", RFC1519, September 1993. 3978 [RFC1997] R. Chandra, P. Traina, T. Li, "BGP Communities Attribute", 3979 RFC 1997, August 1996. 3981 [RFC2439] C. Villamizar, R. Chandra, R. Govindan, "BGP Route Flap 3982 Damping", RFC2439, November 1998. 3984 [RFC2796] Bates, T., Chandra, R., Chen, E., "BGP Route Reflection - 3985 An Alternative to Full Mesh IBGP", RFC2796, April 2000. 3987 [RFC2842] R. Chandra, J. Scudder, "Capabilities Advertisement with 3988 BGP-4", RFC2842. 3990 [RFC2858] T. Bates, R. Chandra, D. Katz, Y. Rekhter, "Multiprotocol 3991 Extensions for BGP-4", RFC2858. 3993 [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC2918, 3994 September 2000. 3996 [RFC3065] Traina, P, McPherson, D., Scudder, J., "Autonomous System 3997 Confederations for BGP", RFC3065, February 2001. 3999 RFC DRAFT March 2003 4001 [IS10747] "Information Processing Systems - Telecommunications and 4002 Information Exchange between Systems - Protocol for Exchange of 4003 Inter-domain Routeing Information among Intermediate Systems to Sup- 4004 port Forwarding of ISO 8473 PDUs", ISO/IEC IS10747, 1993 4006 [XXX] Murphy, S., "BGP Security Vulnerabilities Analysis", draft- 4007 ietf-idr-bgp-vuln-00.txt, work in progress 4009 Editors' Addresses 4011 Yakov Rekhter 4012 Juniper Networks 4013 email: yakov@juniper.net 4015 Tony Li 4016 Procket Networks, Inc. 4017 email: tli@procket.com 4019 Susan Hares 4021 NextHop Technologies, Inc. 4022 email: skh@nexthop.com