idnits 2.17.1 draft-ietf-bgmp-spec-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 458 has weird spacing: '... target list ...' == Line 1451 has weird spacing: '...is less than...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 2002) is 7775 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BR32' is mentioned on line 248, but not defined == Missing Reference: 'BR41' is mentioned on line 248, but not defined == Missing Reference: 'BR31' is mentioned on line 250, but not defined == Missing Reference: 'BR42' is mentioned on line 250, but not defined == Missing Reference: 'BR43' is mentioned on line 250, but not defined == Missing Reference: 'BR22' is mentioned on line 252, but not defined == Missing Reference: 'BR52' is mentioned on line 252, but not defined == Missing Reference: 'BR53' is mentioned on line 252, but not defined == Missing Reference: 'BR21' is mentioned on line 254, but not defined == Missing Reference: 'BR51' is mentioned on line 254, but not defined == Missing Reference: 'BR12' is mentioned on line 256, but not defined == Missing Reference: 'BR61' is mentioned on line 256, but not defined == Missing Reference: 'BR13' is mentioned on line 258, but not defined == Missing Reference: 'BR71' is mentioned on line 262, but not defined == Missing Reference: 'BR81' is mentioned on line 262, but not defined == Missing Reference: 'BRXY' is mentioned on line 266, but not defined == Missing Reference: 'PIM-SM' is mentioned on line 289, but not defined == Unused Reference: 'DVMRP' is defined on line 1921, but no explicit reference was found in the text == Unused Reference: 'MOSPF' is defined on line 1933, but no explicit reference was found in the text == Unused Reference: 'PIMDM' is defined on line 1937, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 2715 (ref. 'INTEROP') ** Obsolete normative reference: RFC 2401 (ref. 'IPSEC') (Obsoleted by RFC 4301) -- Possible downref: Normative reference to a draft: ref. 'V4PREFIX' -- Unexpected draft version: The latest known version of draft-ietf-ipngwg-uni-based-mcast is -02, but you're referring to -03. -- Obsolete informational reference (is this intentional?): RFC 1771 (ref. 'BGP') (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 2283 (ref. 'MBGP') (Obsoleted by RFC 2858) == Outdated reference: A later version (-11) exists of draft-ietf-idmr-dvmrp-v3-10 == Outdated reference: A later version (-05) exists of draft-ietf-idmr-membership-reports-04 -- Obsolete informational reference (is this intentional?): RFC 2373 (ref. 'IPv6AA') (Obsoleted by RFC 3513) == Outdated reference: A later version (-05) exists of draft-ietf-pim-dm-new-v2-01 -- Obsolete informational reference (is this intentional?): RFC 2362 (ref. 'PIMSM') (Obsoleted by RFC 4601, RFC 5059) -- Obsolete informational reference (is this intentional?): RFC 1966 (ref. 'REFLECT') (Obsoleted by RFC 4456) Summary: 5 errors (**), 0 flaws (~~), 28 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BGMP Working Group D. Thaler 3 Internet Engineering Task Force Microsoft 4 INTERNET-DRAFT 30 June 2002 5 Expires December 2002 7 Border Gateway Multicast Protocol (BGMP): 8 Protocol Specification 9 11 Status of this Memo 13 This document is an Internet-Draft and is in full conformance with all 14 provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering Task 17 Force (IETF), its areas, and its working groups. Note that other groups 18 may also distribute working documents as Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference material 23 or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/1id-abstracts.html 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html 31 Abstract 33 This document describes BGMP, a protocol for inter-domain multicast 34 routing. BGMP builds shared trees for active multicast groups, and 35 optionally allows receiver domains to build source-specific, inter- 36 domain, distribution branches where needed. BGMP natively supports 37 Draft BGMP June 2002 39 "source-specific multicast" (SSM). To also support "any-source 40 multicast" (ASM), BGMP requires that each multicast group be associated 41 with a single root (in BGMP it is referred to as the root domain). It 42 requires that different ranges of the class D space are associated 43 (e.g., with Unicast-Prefix-Based Multicast addressing) with different 44 domains. Each of these domains then becomes the root of the shared 45 domain-trees for all groups in its range. Multicast participants will 46 generally receive better multicast service if the session initiator's 47 address allocator selects addresses from its own domain's part of the 48 space, thereby causing the root domain to be local to at least one of 49 the session participants. 51 Copyright Notice 53 Copyright (C) The Internet Society (2002). All Rights Reserved. 55 1. Acknowledgements 57 In addition to the editors, the following individuals have 58 contributed to the design of BGMP: Cengiz Alaettinoglu, Tony 59 Ballardie, Steve Casner, Steve Deering, Deborah Estrin, Dino 60 Farinacci, Bill Fenner, Mark Handley, Ahmed Helmy, Van Jacobson, Dave 61 Meyer, and Satish Kumar. 63 This document is the product of the IETF BGMP Working Group with Dave 64 Thaler as editor. 66 Rusty Eddy, Isidor Kouvelas, and Pavlin Radoslavov also provided 67 valuable feedback on this document. 69 2. Purpose 71 It has been suggested that inter-domain "any-source" multicast is 72 better supported with a rendezvous mechanism whereby members receive 73 source's data packets without any sort of global broadcast (e.g., 74 MSDP broadcasts source information, PIM-DM and DVMRP broadcast 75 initial data packets, and MOSPF broadcasts membership information). 76 PIM-SM [PIMSM] and CBT [CBT] use a shared group-tree, to which all 77 members join and thereby hear from all sources (and to which non- 78 members do not join and thereby hear from no sources). 80 This document describes BGMP, a protocol for inter-domain multicast 81 routing. BGMP natively supports "source-specific multicast" (SSM). 83 Draft BGMP June 2002 85 To also support "any-source multicast" (ASM), BGMP builds shared 86 trees for active multicast groups, and allows domains to build 87 source-specific, inter-domain, distribution branches where needed. 88 Building upon concepts from PIM-SM and CBT, BGMP requires that each 89 global multicast group be associated with a single root. However, in 90 BGMP, the root is an entire exchange or domain, rather than a single 91 router. 93 For non-source-specific groups, BGMP assumes that ranges of the class 94 D space have been associated (e.g., with Unicast-Prefix-Based 95 Multicast [V4PREFIX,V6PREFIX] addressing) with selected domains. 96 Each such domain then becomes the root of the shared domain-trees for 97 all groups in its range. An address allocator will generally achieve 98 better distribution trees if it takes its multicast addresses from 99 its own domain's part of the space, thereby causing the root domain 100 to be local. 102 BGMP uses TCP as its transport protocol. This eliminates the need to 103 implement message fragmentation, retransmission, acknowledgement, and 104 sequencing. BGMP uses TCP port 264 for establishing its connections. 105 This port is distinct from BGP's port to provide protocol 106 independence, and to facilitate distinguishing between protocol 107 packets (e.g., by packet classifiers, diagnostic utilities, etc.) 109 Two BGMP peers form a TCP connection between one another, and 110 exchange messages to open and confirm the connection parameters. 111 They then send incremental Join/Prune Updates as group memberships 112 change. BGMP does not require periodic refresh of individual 113 entries. KeepAlive messages are sent periodically to ensure the 114 liveness of the connection. Notification messages are sent in 115 response to errors or special conditions. If a connection encounters 116 an error condition, a notification message is sent and the connection 117 is closed if the error is a fatal one. 119 3. Revision History 121 29 June 2002 draft-01 123 (1) Removed all references to MASC and G-RIB. The current spec only 124 covers BGMP operation for source-specific groups, and any-source- 125 multicast using unicast prefix-based multicast addresses (for 126 both IPv4 and IPv6). No new routes of any type are needed in the 127 routing table. 129 Draft BGMP June 2002 131 (2) Removed section on transitioning away from using DVMRP as the 132 backbone to an AS-based multicast routing system with MBGP, as 133 this has already happened. 135 4. Terminology 137 This document uses the following technical terms: 139 Domain: 140 A set of one or more contiguous links and zero or more routers 141 surrounded by one or more multicast border routers. Note that this 142 loose definition of domain also applies to an external link between 143 two domains, as well as an exchange. 145 Root Domain: 146 When constructing a shared tree of domains for some group, one 147 domain will be the "root" of the tree. The root domain receives 148 data from each sender to the group, and functions as a rendezvous 149 domain toward which member domains can send inter-domain joins, and 150 to which sender domains can send data. 152 Multicast RIB: 153 The Routing Information Base, or routing table, used to calculate 154 the "next-hop" towards a particular address for multicast traffic. 156 Multicast IGP (M-IGP): 157 A generic term for any multicast routing protocol used for tree 158 construction within a domain. Typical examples of M-IGPs are: PIM- 159 SM, PIM-DM, DVMRP, MOSPF, and CBT. 161 EGP: A generic term for the interdomain unicast routing protocol in use. 162 Typically, this will be some version of BGP which can support a 163 Multicast RIB, such as MBGP [MBGP], containing both unicast and 164 multicast address prefixes. 166 Component: 167 The portion of a border router associated with (and logically 168 inside) a particular domain that runs the multicast IGP (M-IGP) for 169 that domain, if any. Each border router thus has zero or more 170 components inside routing domains. In addition, each border router 171 with external links that do not fall inside any routing domain will 172 have an inter-domain component that runs BGMP. 174 Draft BGMP June 2002 176 External peer: 177 A border router in another multicast AS (autonomous system, as used 178 in BGP), to which a BGMP TCP-connection is open. If BGP is being 179 used as the EGP, a separate "eBGP" TCP-connection will also be open 180 to the same peer. 182 Internal peer: 183 Another border router of the same multicast AS. If BGP is being 184 used as the EGP, the border router either speaks iBGP ("internal" 185 BGP) directly to internal peers in a full mesh, or indirectly 186 through a route reflector [REFLECT]. 188 Next-hop peer: 189 The next-hop peer towards a given IP address is the next EGP router 190 on the path to the given address, according to multicast RIB routes 191 in the EGP's routing table (e.g., in MBGP, routes whose Subsequent 192 Address Family Identifier field indicates that the route is valid 193 for multicast traffic). 195 target: 196 Either an EGP peer, or an M-IGP component. 198 Tree State Table: 199 This is a table of (S-prefix,G) and (*,G-prefix) entries that have 200 been explicitly joined by a set of targets. Each entry has, in 201 addition to the source and group addresses and masks, a list of 202 targets that have explicitly requested data (on behalf of directly 203 connected hosts or downstream routers). (S,G) entries also have an 204 "SPT" bit. 206 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in 207 this document are to be interpreted as described in [RFC2119]. 209 5. Protocol Overview 211 BGMP maintains group-prefix state in response to messages from BGMP 212 peers and notifications from M-IGP components. Group-shared trees 213 are rooted at the domain advertising the group prefix covering those 214 groups. When a receiver joins a specific group address, the border 215 router towards the root domain generates a group-specific Join 216 message, which is then forwarded Border-Router-by-Border-Router 217 towards the root domain (see Figure 1). BGMP Join and Prune messages 218 are sent over TCP connections between BGMP peers, and BGMP protocol 219 state is refreshed by KEEPALIVE messages periodically sent over TCP. 221 Draft BGMP June 2002 223 BGMP routers build group-specific bidirectional forwarding state as 224 they process the BGMP Join messages. Bidirectional forwarding state 225 means that packets received from any target are forwarded to all 226 other targets in the target list without any RPF checks. No group- 227 specific state or traffic exists in parts of the network where there 228 are no members of that group. 230 BGMP routers optionally build source-specific unidirectional 231 forwarding state, only where needed, to be compatible with source- 232 specific trees (SPTs) used by some M-IGPs (e.g., DVMRP, PIM-DM, or 233 PIM-SM), or to construct trees for source-specific groups. A domain 234 that uses an SPT-based M-IGP may need to inject multicast packets 235 from external sources via different border routers (to be compatible 236 with the M-IGP RPF checks) which thus act as "surrogates". For 237 example, in the Transit_1 domain, data from Src_A arrives at BR12, 238 but must be injected by BR11. A surrogate router may create a 239 source-specific BGMP branch if no shared tree state exists. Note: 240 stub domains with a single border router, such as Rcvr_Stub_7 in 241 Figure 1, receive all multicast data packets through that router, to 242 which all RPF checks point. Therefore, stub domains never build 243 source-specific state. 245 Root_Domain 246 [BR91]--------------------------\ 247 | | 248 [BR32] [BR41] 249 Transit_3 Transit_4 250 [BR31] [BR42] [BR43] 251 | | | 252 [BR22] [BR52] [BR53] 253 Transit_2 Transit_5 254 [BR21] [BR51] 255 | | 256 [BR12] [BR61] 257 Transit_1[BR11]----------[BR62]Stub_6 258 [BR13] (Src_A) 259 | (Rcvr_D) 260 ------------------- 261 | | 262 [BR71] [BR81] 263 Rcvr_Stub_7 Src_only_Stub_8 264 (Rcvr_C) (Src_B) 266 Figure 1: Example inter-domain topology. [BRXY] represents a BGMP 267 border 269 Draft BGMP June 2002 271 router. Transit_X is a transit domain network. *_Stub_X is a stub 272 domain network. 274 Data packets are forwarded based on a combination of BGMP and M-IGP 275 rules. The router forwards to a set of targets according to a 276 matching (S,G) BGMP tree state entry if it exists. If not found, the 277 router checks for a matching (*,G) BGMP tree state entry. If neither 278 is found, then the packet is sent natively to the next-hop EGP peer 279 for G, according to the Multicast RIB (for example, in the case of a 280 non-member sender such as Src_B in Figure 1). If a matching entry 281 was found, the packet is forwarded to all other targets in the target 282 list. In this way BGMP trees forward data in a bidirectional manner. 283 If a target is an M-IGP component then forwarding is subject to the 284 rules of that M-IGP protocol. 286 5.1. Design Rationale 288 Several other protocols, or protocol proposals, build shared trees 289 within domains [PIM-SM, CBT]. The design choices made for BGMP 290 result from our focus on Inter-Domain multicast in particular. The 291 design choices made by PIM-SM and CBT are better suited to the wide- 292 area intra-domain case. There are three major differences between 293 BGMP and other shared-tree protocols: 295 (1) Unidirectional vs. Bidirectional trees 297 Bidirectional trees (using bidirectional forwarding state as 298 described above) minimize third party dependence which is essential 299 in the inter-domain context. For example, in Figure 1, stub domains 300 7 and 8 would like to exchange multicast packets without being 301 dependent on the quality of connectivity of the root domain. 302 However, unidirectional shared trees (i.e., those using RPF checks) 303 have more aggressive loop prevention and share the same processing 304 rules as source-specific entries which are inherently unidirectional. 306 The lack of third party dependence concerns in the INTRA domain case 307 reduces the incentive to employ bidirectional trees. BGMP supports 308 bidirectional trees because it has to, and because it can without 309 excessive cost. 311 (2) Source-specific distribution trees/branches 313 In a departure from other shared tree protocols, source-specific BGMP 315 Draft BGMP June 2002 317 state is built ONLY where (a) it is needed to pull the multicast 318 traffic down to a BGMP router that has source-specific (S,G) state, 319 and (b) that router is NOT already on the shared tree (i.e., has no 320 (*,G) state), and (c) that router does not want to receive packets 321 via encapsulation from a router which is on the shared tree. BGMP 322 provides source-specific branches because most M-IGP protocols in use 323 today build source-specific trees. BGMP's source-specific branches 324 eliminate the unnecessary overhead of encapsulations for high data 325 rate sources from the shared tree's ingress router to the surrogate 326 injector (e.g. from BR12 to BR11 in Figure 1). Moreover, cases in 327 which shared paths are significantly longer than SPT paths will also 328 benefit. 330 However, except for source-specific group distribution trees, we do 331 not build source-specific inter-domain trees in general because (a) 332 inter-domain connectivity is generally less rich than intra-domain 333 connectivity, so shared distribution trees should have more 334 acceptible path length and traffic concentration properties in the 335 inter-domain context, than in the intra-domain case, and (b) by 336 having the shared tree state always take precedence over source- 337 specific tree state, we avoid ambiguities that can otherwise arise. 339 In summary, BGMP trees are, in a sense, a hybrid between PIM-SM and 340 CBT trees. 342 (3) Method of choosing root of group shared tree 344 The choice of a group's shared-tree-root has implications for 345 performance and policy. In the intra-domain case it is sometimes 346 assumed that all potential shared-tree roots (RPs/Cores) within the 347 domain are equally suited to be the root for a group that is 348 initiated within that domain. In the INTER-domain case, there is far 349 more opportunity for unacceptably poor locality, and administrative 350 control of a group's shared-tree root. Therefore in the intra-domain 351 case, other protocols sometimes treat all candidate roots (RPs or 352 Cores) as equivalent and emphasize load sharing and stability to 353 maximize performance. In the Inter-Domain case, all roots are not 354 equivalent, and we adopt an approach whereby a group's root domain is 355 not random but is subject to administrative control. 357 6. Protocol Details 359 In this section, we describe the detailed protocol that border 360 routers perform. We assume that each border router conforms to the 362 Draft BGMP June 2002 364 component-based model described in [INTEROP], modulo one correction 365 to section 3.2 ("BGMP" Dispatcher), as follows: 367 The iif owner of a (*,G) entry is the component owning the next-hop 368 interface towards the nominal root of G, in the multicast RIB. 370 6.1. Interaction with the EGP 372 The fundamental requirements imposed by BGMP are that: 374 (1) For a given source-specific group and source, BGMP must be able 375 to look up the next-hop towards the source in the Multicast RIB, 376 and 378 (2) For a given non-source-specific group, BGMP will map the group 379 address to a nominal "root" address, and must be able to look up 380 the next-hop towards that address in the Multicast RIB. 382 BGMP determines the nominal "root" address as follows. If the multicast 383 address is a Unicast-Prefix-based Multicast address (for either IPv4 or 384 IPv6), then the nominal root address is the embedded unicast prefix, 385 padded with a suffix of 0 bits to form a full address. 387 For example, if the IPv6 group address is 388 ff2e:0100:1234:5678:9abc:def0::123, then the unicast prefix is 389 1234:5678:9abc:def0/64, and the nominal root address would be 390 1234:5678:9abc:def0::. (This address is in fact the subnet routers 391 anycast address [IPv6AA].) 393 As an IPv4 example, if the IPv4 group address were 225.1.2.3, then the 394 nominal root address would be 1.2.3.0. 396 Support for any-source-multicast using any address other than a Unicast- 397 prefix-based Multicast Address is outside the scope of this document. 399 6.2. Multicast Data Packet Processing 401 For BGMP rules to be applied, an incoming packet must first be 402 "accepted": 404 o If the packet arrived on an interface owned by an M-IGP, the M-IGP 405 component determines whether the packet should be accepted or 406 dropped according to its rules. If the packet is accepted, the 408 Draft BGMP June 2002 410 packet is forwarded (or not forwarded) out any other interfaces 411 owned by the same component, as specified by the M-IGP. 413 o If the packet was received over a point-to-point interface owned 414 by BGMP, the packet is accepted. 416 o If the packet arrived on a multiaccess network interface owned by 417 BGMP, the packet is accepted if it is receiving data on a source- 418 specific branch, if it is the designated forwarder for the longest 419 matching route for S, or for the longest matching route for the 420 nominal root of G. 422 If the packet is accepted, then the router checks the tree state 423 table for a matching (S,G) entry. If one is found, but the packet 424 was not received from the next hop target towards S (if the entry's 425 SPT bit is True), or was not received from the next hop target 426 towards G (if the entry's SPT bit is False) then the packet is 427 dropped and no further actions are taken. If no (S,G) entry was 428 found, the router then checks for a matching (*,G) entry. 430 If neither is found, then the packet is forwarded towards the next- 431 hop peer for the nominal root of G, according to the Multicast RIB. 432 If a matching entry was found, the packet is forwarded to all other 433 targets in the target list. 435 Forwarding to a target which is an M-IGP component means that the 436 packet is forwarded out any interfaces owned by that component 437 according to that component's multicast forwarding rules. 439 6.3. BGMP processing of Join and Prune messages and notifications 441 6.3.1. Receiving Joins 443 When the BGMP component receives a (*,G) or (S,G) Join alert from 444 another component, or a BGMP (S,G) or (*,G) Join message from an 445 external peer, it searches the tree state table for a matching entry. 446 If an entry is found, and that peer is already listed in the target 447 list, then no further actions are taken. 449 Otherwise, if no (*,G) or (S,G) entry was found, one is created. In 450 the case of a (*,G), the target list is initialized to contain the 451 next-hop peer towards the nominal root of G, if it is an external 452 peer. If the peer is internal, the target list is initialized to 453 contain the M-IGP component owning the next-hop interface. If there 455 Draft BGMP June 2002 457 is no next-hop peer (because the nominal root of G is inside the 458 domain), then the target list is initialized to contain the next-hop 459 component. If an (S,G) entry exists for the same G for which the 460 (*,G) Join is being processed, and the next-hop peers toward S and 461 the nominal root of G are different, the BGMP router must first send 462 a (S,G) Prune message toward the source and clear the SPT bit on the 463 (S,G) entry, before activating the (*,G) entry. 465 When creating (S,G) state, if the source is internal to the BGMP 466 speaker's domain, a "Poison-Reverse" bit (PR-bit) is set. This bit 467 indicates that the router may receive packets matching (S,G) anyway 468 due to the BGMP speaker being a member of a domain on the path 469 between S and the root domain. (Depending on the M-IGP protocol, it 470 may in fact receive such packets anyway only if it is the best exit 471 for the nominal root of G.) 473 The target from which the Join was received is then added to the 474 target list. The router then looks up S or the nominal root of G in 475 the Multicast RIB to find the next-hop EGP peer. If the target list, 476 not including the next-hop target towards G for a (*,G) entry, 477 becomes non-null as a result, the next-hop EGP peer must be notified 478 as follows: 480 a) If the next-hop peer towards the nominal root of G (for a (*,G) 481 entry) is an external peer, a BGMP (*,G) Join message is unicast 482 to the external peer. If the next-hop peer towards S (for an 483 (S,G) entry) is an external peer, and the router does NOT have any 484 active (*,G) state for that group address G, a BGMP (S,G) Join 485 message is unicast to the external peer. A BGMP (S,G) Join 486 message is never sent to an external peer by a router that also 487 contains active (*,G) state for the same group. If the next-hop 488 peer towards S (for an (S,G entry) is an external peer and the 489 router DOES have active (*,G) state for that group G, the SPT bit 490 is always set to False. 492 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Join 493 alert is sent to the M-IGP component owning the next-hop 494 interface. 496 c) If there is no next-hop peer, a (*,G) or (S,G) Join alert is sent 497 to the M-IGP component owning the next-hop interface. 499 Finally, if an (S,G) Join is received from an internal peer, the 500 peer should be stored with the M-IGP component target. If (S,G) 502 Draft BGMP June 2002 504 state exists with the PR-bit set, and the next-hop towards the 505 nominal root for G is through the M-IGP component, an (S,G) 506 Poison-Reverse message is immediately sent to the internal peer. 508 If an (S,G) Join is received from an external peer, and (S,G) 509 state exists with the PR-bit set, and the local BGMP speaker is 510 the best exit for the nominal root of G, and the next-hop towards 511 the nominal root for G is through the interface towards the 512 external peer, an (S,G) Poison-Reverse message is immediately sent 513 to the external peer. 515 6.3.2. Receiving Prune Notifications 517 When the BGMP component receives a (*,G) or (S,G) Prune alert from 518 another component, or a BGMP (*,G) or (S,G) Prune message from an 519 external peer, it searches the tree state table for a matching entry. 520 If no (S,G) entry was found for an (S,G) Prune, but (*,G) state 521 exists, an (S,G) entry is created, with the target list copied from 522 the (*,G) entry. If no matching entry exists, or if the component or 523 peer is not listed in the target list, no further actions are taken. 525 Otherwise, the component or peer is removed from the target list. If 526 the target list becomes null as a result, the next-hop peer towards 527 the nominal root of G (for a (*,G) entry), or towards S (for an (S,G) 528 entry if and only if the BGMP router does NOT have any corresponding 529 (*,G) entry), must be notified as follows. 531 a) If the peer is an external peer, a BGMP (*,G) or (S,G) Prune 532 message is unicast to it. 534 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Prune 535 alert is sent to the M-IGP component owning the next-hop 536 interface. 538 c) If there is no next-hop peer, a (*,G) or (S,G) Prune alert is sent 539 to the M-IGP component owning the next-hop interface. 541 6.3.3. Receiving Route Change Notifications 543 When a border router receives a route for a new prefix in the 544 multicast RIB, or a existing route for a prefix is withdrawn, a route 545 change notification for that prefix must be sent to the BGMP 547 Draft BGMP June 2002 549 component. In addition, when the next hop peer (according to the 550 multicast RIB) changes, a route change notification for that prefix 551 must be sent to the BGMP component. 553 In addition, in IPv4 (only), an internal route for each class-D 554 prefix associated with the domain (if any) MUST be injected into the 555 multicast RIB in the EGP by the domain's border routers. 557 When a route for a new group prefix is learned, or an existing route 558 for a group prefix is withdrawn, or the next-hop peer for a group 559 prefix changes, a BGMP router updates all affected (*,G) target 560 lists. The router sends a (*,G) Join to the new next-hop target, and 561 a (*,G) Prune to the old next-hop target, as appropriate. In 562 addition, if any (S,G) state exists with the PR-bit set: 564 o If the BGMP speaker has just become the best exit for the nominal 565 root of G, an (S,G) Poison Reverse message with the PR-bit set is 566 sent as noted below. 568 o If the BGMP speaker was the best exit for the nominal root of G 569 and is no longer, an (S,G) Poison Reverse message with the PR-bit 570 clear is sent as noted below. 571 The (S,G) Poison-Reverse messages are sent to all external peers on 572 the next-hop interface towards the nominal root of G from which (S,G) 573 Joins have been received. 575 When an existing route for a source prefix is withdrawn, or the next- 576 hop peer for a source prefix changes, a BGMP router updates all 577 affected (S,G) target lists. The router sends a (S,G) Join to the 578 new next-hop target, and a (S,G) Prune to the old next-hop target, as 579 appropriate. 581 6.3.4. Receiving (S,G) Poison-Reverse messages 583 When a BGMP speaker receives an (S,G) Poison-Reverse message from a 584 peer, it sets the PR-bit on the (S,G) state to match the PR-bit in 585 the message, and looks up the next-hop towards the nominal root of G. 586 If the next-hop target is an M-IGP component, it forwards the (S,G) 587 Poison Reverse message to all internal peers of that component from 588 which it has received (S,G) Joins. If the next-hop target is an 589 external peer on a given interface, it forwards the (S,G) Poison 590 Reverse message to all external peers on that interface. 592 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 594 Draft BGMP June 2002 596 external peer, with the PR-bit set, and the speaker has received no 597 (S,G) Joins from any other peers (e.g., only from the M-IGP, or has 598 (S,G) state due to encapsulation as described in 5.4.1), it knows 599 that its own (S,G) Join is unnecessary, and should send an (S,G) 600 Prune. 602 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 603 internal peer, with the PR-bit set, and the speaker is the best exit 604 for the nominal root of G, and has (S,G) prune state, an (S,G) Join 605 message is sent to cancel the prune state and the state is deleted. 607 6.4. Interaction with M-IGP components 609 When an M-IGP component on a border router first learns that there 610 are internally-reached members for a group G (whose scope is larger 611 than that domain), a (*,G) Join alert is sent to the BGMP component. 612 Similarly, when an M-IGP component on a border router learns that 613 there are no longer internally-reached members for a group G (whose 614 scope is larger than a single domain), a (*,G) Prune alert is sent to 615 the BGMP component. 617 At any time, any M-IGP domain MAY decide to join a source-specific 618 branch for some external source S and group G. When the M-IGP 619 component in the border router that is the next-hop router for a 620 particular source S learns that a receiver wishes to receive data 621 from S on a source-specific path, an (S,G) Join alert is sent to the 622 BGMP component. When it is learned that such receivers no longer 623 exist, an (S,G) Prune alert is sent to the BGMP component. Recall 624 that the BGMP component will generate external source-specific Joins 625 only where the source-specific branch does not coincide with the 626 shared tree distribution tree for that group. 628 Finally, we will require that the border router that is the next-hop 629 internal peer for a particular address S or the nominal root of G be 630 able to forward data for a matching tree state table entry to all 631 members within the domain. This requirement has implications on 632 specific M-IGPs as follows. 634 6.4.1. Interaction with DVMRP and PIM-DM 636 DVMRP and PIM-DM are both "broadcast and prune" protocols in which 637 every data packet must pass an RPF check against the packet's source 638 address, or be dropped. If the border router receiving packets from 640 Draft BGMP June 2002 642 an external source is the only BR to inject the route for the source 643 into the domain, then there are no problems. For example, this will 644 always be true for stub domains with a single border router (see 645 Figure 1). Otherwise, the border router receiving packets externally 646 is responsible for encapsulating the data to any other border routers 647 that must inject the data into the domain for RPF checks to succeed. 649 When an intended border router injector for a source receives 650 encapsulated packets from another border router in its domain, it 651 should create source-specific (S,G) BGMP state. Note that the border 652 router may be configured to do this on a data-rate triggered basis so 653 that the state is not created for very low data-rate/intermittent 654 sources. If source-specific state is created, then its incoming 655 interface points to the virtual encapsulation interface from the 656 border router that forwarded the packet, and it has an SPT flag that 657 is initialized to be False. 659 When the (S,G) BGMP state is created, the BGMP component will in turn 660 send a BGMP (S,G) Join message to the next-hop external peer towards 661 S if there is no (*,G) state for that same group, G. The (S,G) BGMP 662 state will have the SPT bit set to False if (*,G) BGMP state is 663 present. 665 When the first data packet from S arrives from the external peer and 666 matches on the BGMP (S,G) state, and IF there is no (*,G) state, the 667 router sets the SPT flag to True, resets the incoming interface to 668 point to the external peer, and sends a BGMP (S,G) Prune message to 669 the border router that was encapsulating the packets (e.g., in Figure 670 1, BR11 sends the (Src_A,G) Prune to BR12). When the border router 671 with (*,G) state receives the prune for (S,G), it then deletes that 672 border router from its list of targets. 674 If the decapsulator receives a (S,G) Poison Reverse message with the 675 PR-bit set, it will forward it to the encapsulator (which may again 676 forward it up the shared tree according to normal BGMP rules), and 677 both will delete their BGMP (S,G) state. 679 PIM-DM and DVMRP present an additional problem, i.e., no protocol 680 mechanism exists for joining and pruning entire groups; only joins 681 and prunes for individual sources are available. If any such domain 682 desires to be able to serve as a transit domain, we require that some 683 form of Domain-Wide Reports (DWRs) [DWR] are available within such 684 domains. Such messages provide the ability to join and prune an 685 entire group across the domain. One simple heuristic to approximate 686 DWRs is to assume that if there are any internally-reached members, 688 Draft BGMP June 2002 690 then at least one of them is a sender. With this heuristic, the 691 presense of any M-IGP (S,G) state for internally-reached sources can 692 be used instead. Sending a data packet to a group is then equivalent 693 to sending a DWR for the group. 695 6.4.2. Interaction with PIM-SM 697 Protocols such as PIM-SM build unidirectional shared and source- 698 specific trees. As with DVMRP and PIM-DM, every data packet must 699 pass an RPF check against some group-specific or source-specific 700 address. 702 The fewest encapsulations/decapsulations will be done when the intra- 703 domain tree is rooted at the next-hop internal peer (which becomes 704 the RP) towards the nominal root of G, since in general that router 705 will receive the most packets from external sources. To achieve 706 this, each BGMP border router to a PIM-SM domain should send 707 Candidate-RP-Advertisements within the domain for those groups for 708 which it is the shared-domain tree ingress router. When the border 709 router that is the RP for a group G receives an external data packet, 710 it forwards the packet according to the M-IGP (i.e., PIM-SM) shared- 711 tree outgoing interface list. 713 Other border routers will receive data packets from external sources 714 that are farther down the bidirectional tree of domains. When a 715 border router that is not the RP receives an external packet for 716 which it does not have a source-specific entry, the border router 717 treats it like a local source by creating (S,G) state with a Register 718 flag set, based on normal PIM-SM rules; the Border router then 719 encapsulates the data packets in PIM-SM Registers and unicasts them 720 to the RP for the group. As explained above, the RP for the inter- 721 domain group will be one of the other border routers of the domain. 723 If a source's data rate is high enough, DRs within the PIM-SM domain 724 may switch to the shortest path tree. If the shortest path to an 725 external source is via the group's ingress router for the shared 726 tree, the new (S,G) state in the BGMP border router will not cause 727 BGMP (S,G) Joins because that border router will already have (*,G) 728 state. If however, the shortest path to an external source is via 729 some other border router, that border router will create (S,G) BGMP 730 state in response to the M-IGP (S,G) Join alert. In this case, 731 because there is no local (*,G) state to supress it, the border 732 router will send a BGMP (S,G) Join to the next-hop external peer 734 Draft BGMP June 2002 736 towards S, in order to pull the data down directly. (See BR11 in 737 Figure 1.) As in normal PIM-SM operation, those PIM-SM routers that 738 have (*,G) and (S,G) state pointing to different incoming interfaces 739 will prune that source off the shared tree. Therefore, all internal 740 interfaces may be eventually pruned off the internal shared tree. 742 After the border router sends a BGMP (S,G) Join, if its (S,G) state 743 has the PR-bit clear, a (S,G) Poison-Reverse message (with the PR-bit 744 clear) is sent to the ingress router for G. The ingress router then 745 creates (S,G) if it does not already exist, and removes the next hop 746 towards the nominal root of G from the target list. 748 If the border router later receives an (S,G) Poison-Reverse message 749 with the PR-bit set, the Poison-Reverse message is forwarded to the 750 ingress router for G. The best-exit router then creates (S,G) state 751 if it does not already exist, and puts the next hop towards the 752 nominal root of G in the target list if not already present. 754 6.4.3. Interaction with CBT 756 CBT builds bidirectional shared trees but must address two points of 757 compatibility with BGMP. First, CBT can not accommodate more than 758 one border router injecting a packet. Therefore, if a CBT domain 759 does have multiple external connections, the M-IGP components of the 760 border routers are responsible for insuring that only one of them 761 will inject data from any given source. 763 Second, CBT cannot process source-specific Joins or Prunes. Two 764 options thus exist for each CBT domain: 766 Option A: 767 The CBT component interprets a (S,G) Join alert as if it were an 768 (*,G) Join alert, as described in [INTEROP]. That is, if it is not 769 already on the core-tree for G, then it sends a CBT (*,G) JOIN- 770 REQUEST message towards the core for G. Similarly, when the CBT 771 component receives an (S,G) Prune alert, and the child interface 772 list for a group is NULL, then it sends a (*,G) QUIT_NOTIFICATION 773 towards the core for G. This option has the disadvantage of 774 pulling all data for the group G down to the CBT domain when no 775 members exist. 777 Option B: 778 The CBT domain does not propagate any source routes (i.e., non- 779 class D routes) to their external peers for the Multicast RIB 781 Draft BGMP June 2002 783 unless it is known that no other path exists to that prefix (e.g., 784 routes for prefixes internal to the domain or in a singly-homed 785 customer's domain may be propagated). This insures that source- 786 specific joins are never received unless the source's data already 787 passes through the domain on the shared tree, in which case the 788 (S,G) Join need not be propagated anyway. BGMP border routers will 789 only send source-specific Joins or Prunes to an external peer if 790 that external peer advertises source-prefixes in the EGP. If a 791 BGMP-CBT border router does receive an (S,G) Join or Prune, that 792 border router should ignore the message. 794 To minimize en/de-capsulations, CBTv2 BR's may follow the same 795 scheme as described under PIM-SM above, in which Candidate-Core 796 advertisements are sent for those groups for which it is the 797 shared-tree ingress router. 799 6.4.4. Interaction with MOSPF 801 As with CBTv2, MOSPF cannot process source-specific Joins or Prunes, 802 and the same two options are available. Therefore, an MOSPF domain 803 may either: 805 Option A: 806 send a Group-Membership-LSA for all of G in response to a (S,G) 807 Join alert, and "prematurely age" it out (when no other downstream 808 members exist) in response to an (S,G) Prune alert, OR 810 Option B: 811 not propagate any source routes (i.e., non-class D routes) to their 812 external peers for the Multicast RIB unless it is known that no 813 other path exists to that prefix (e.g., routes for prefixes 814 internal to the domain or in a singly-homed customer's domain may 815 be propagated) 817 6.5. Operation over Multi-access Networks 819 Multiaccess links require special handling to prevent duplicates. 820 The following mechanism enables BGMP to operate over multiaccess 821 links which do not run an M-IGP. This avoids broadcast-and-prune 822 behavior and does not require (S,G) state. 824 To elect a designated forwarder per prefix, BGMP uses a FWDR_PREF 825 message to exchange "forwarder preference" values for each prefix. 827 Draft BGMP June 2002 829 The peer with the highest forwarder preference becomes the designated 830 forwarder, with ties broken by lowest BGMP Identifier. The 831 designated forwarder is the router responsible for forwarding packets 832 up the tree, and is the peer to which joins will be sent. 834 When BGMP first learns that a route exists in the multicast RIB whose 835 next-hop interface is NOT the multiaccess link, the BGMP router sends 836 a BGMP FWDR_PREF message for the prefix, to all BGMP peers on the 837 LAN. The FWDR_PREF message contains a "forwarder preference value" 838 for the local router, and the same value MUST be sent to all peers on 839 the LAN. Likewise, when the prefix is no longer reachable, a 840 FWDR_PREF of 0 is sent to all peers on the LAN. 842 Whenever a BGMP router calculates the next-hop peer towards a 843 particular address, and that peer is reached over a BGMP-owned 844 multiaccess LAN, the designated forwarder is used instead. 846 When a BGMP router receives a FWDR_PREF message from a peer, it looks 847 up the matching route in its multicast RIB, and calculates the new 848 designated forwarder. If the router has tree state entries whose 849 parent target was the old forwarder, it sends Joins to the new 850 forwarder and Prunes to the old forwarder. 852 When a BGMP router which is NOT the designated forwarder receives a 853 packet on the multiaccess link, it is silently dropped. 855 Finally, this mechanism prevents duplicates where full peering exists 856 on a "logical" link. Where full peering does not exist, steps must 857 be taken (outside of BGMP) to present separate logical interfaces to 858 BGMP, each of which is a link with full peering. This might entail, 859 for example, using different link-layer address mappings, doing 860 encapsulation, or changing the physical media. 862 6.6. Interaction between (S,G) state and G-routes 864 As discussed earlier, routers with (*,G) state will not propagate (S,G) 865 joins. However, a special case occurs when (S,G) state coincides with 866 the G-route (or route towards the nominal root of G). When this occurs, 867 care must be taken so that the data will reach the root domain without 868 causing duplicates or black holes. For this reason, (S,G) state on the 869 path between the source and the root domain is annotated as being 870 "poison-reversed". A PR-bit is kept for this purpose, which is updated 871 by (UN)POISON_REVERSE messages. 873 Draft BGMP June 2002 875 7. Message Formats 877 This section describes message formats used by BGMP. 879 Messages are sent over a reliable transport protocol connection. A 880 message is processed only after it is entirely received. The maximum 881 message size is 4096 octets. All implementations are required to 882 support this maximum message size. 884 All fields labelled "Reserved" below must be transmitted as 0, and 885 ignored upon receipt. 887 7.1. Message Header Format 889 Each message has a fixed-size (4-byte) header. There may or may not 890 be a data portion following the header, depending on the message 891 type. The layout of these fields is shown below: 893 0 1 2 3 894 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 895 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 896 | Length | Type | Reserved | 897 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 899 Length: 900 This 2-octet unsigned integer indicates the total length of the 901 message, including the header, in octets. Thus, e.g., it allows 902 one to locate in the transport-level stream the start of the next 903 message. The value of the Length field must always be at least 4 904 and no greater than 4096, and may be further constrained, depending 905 on the message type. No "padding" of extra data after the message 906 is allowed, so the Length field must have the smallest value 907 required given the rest of the message. 909 Type: 910 This 1-octet unsigned integer indicates the type code of the 911 message. The following type codes are defined: 913 1 - OPEN 914 2 - UPDATE 915 3 - NOTIFICATION 916 4 - KEEPALIVE 918 Draft BGMP June 2002 920 7.2. OPEN Message Format 922 After a transport protocol connection is established, the first 923 message sent by each side is an OPEN message. If the OPEN message is 924 acceptable, a KEEPALIVE message confirming the OPEN is sent back. 925 Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION 926 messages may be exchanged. 928 In addition to the fixed-size BGMP header, the OPEN message contains 929 the following fields: 931 0 1 2 3 932 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 933 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 934 | Version | Rsvd| AddrFam | Hold Time | 935 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 936 | BGMP Identifier (variable length) | 937 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 938 | | 939 + (Optional Parameters) | 940 | | 941 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 943 Version: 944 This 1-octet unsigned integer indicates the protocol version number 945 of the message. The current BGMP version number is 1. 947 AddrFam: 948 The IANA-assigned address family number of the BGMP Identifier. 949 These include (among others): 951 Number Description 952 ------ ----------- 953 1 IP (IP version 4) 954 2 IPv6 (IP version 6) 956 Hold Time: 957 This 2-octet unsigned integer indicates the number of seconds that 958 the sender proposes for the value of the Hold Timer. Upon receipt 959 of an OPEN message, a BGMP speaker MUST calculate the value of the 960 Hold Timer by using the smaller of its configured Hold Time and the 961 Hold Time received in the OPEN message. The Hold Time MUST be 963 Draft BGMP June 2002 965 either zero or at least three seconds. An implementation may 966 reject connections on the basis of the Hold Time. The calculated 967 value indicates the maximum number of seconds that may elapse 968 between the receipt of successive KEEPALIVE, and/or UPDATE messages 969 by the sender. 971 BGMP Identifier: 972 This 4-octet (for IPv4) or 16-octet (IPv6) unsigned integer 973 indicates the BGMP Identifier of the sender. A given BGMP speaker 974 sets the value of its BGMP Identifier to a globally-unique value 975 assigned to that BGMP speaker (e.g., an IPv4 address). The value 976 of the BGMP Identifier is determined on startup and is the same for 977 every BGMP session opened. 979 Optional Parameters: 980 This field may contain a list of optional parameters, where each 981 parameter is encoded as a triplet. The combined length of all optional 983 parameters can be derived from the Length field in the message 984 header. 986 0 1 987 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 988 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 989 | Parm. Type | Parm. Length | Parameter Value (variable) 990 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 992 Parameter Type is a one octet field that unambiguously identifies 993 individual parameters. Parameter Length is a one octet field that 994 contains the length of the Parameter Value field in octets. 995 Parameter Value is a variable length field that is interpreted 996 according to the value of the Parameter Type field. 998 This document defines the following Optional Parameters: 1000 a) Authentication Information (Parameter Type 1): 1001 This optional parameter may be used to authenticate a BGMP peer. 1002 The Parameter Value field contains a 1-octet Authentication Code 1003 followed by a variable length Authentication Data. 1005 0 1 2 3 4 5 6 7 8 1007 Draft BGMP June 2002 1009 +-+-+-+-+-+-+-+-+ 1010 | Auth. Code | 1011 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1012 | | 1013 | Authentication Data | 1014 | | 1015 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1017 Authentication Code: 1019 This 1-octet unsigned integer indicates the authentication 1020 mechanism being used. Whenever an authentication mechanism is 1021 specified for use within BGMP, three things must be included in 1022 the specification: 1024 - the value of the Authentication Code which indicates use of the 1025 mechanism, and - the form and meaning of the Authentication Data. 1027 Note that a separate authentication mechanism may be used in 1028 establishing the transport level connection. 1030 Authentication Data: 1032 The form and meaning of this field is a variable-length field 1033 depend on the Authentication Code. 1035 The minimum length of the OPEN message is 12 octets (including 1036 message header). 1038 b) Capability Information (Parameter Type 2): 1039 This is an Optional Parameter that is used by a BGMP-speaker to 1040 convey to its peer the list of capabilities supported by the 1041 speaker. The parameter contains one or more triples , where each triple is 1043 encoded as shown below: 1044 +------------------------------+ 1045 | Capability Code (1 octet) | 1046 +------------------------------+ 1047 | Capability Length (1 octet) | 1048 +------------------------------+ 1049 | Capability Value (variable) | 1050 +------------------------------+ 1051 Capability Code: 1053 Draft BGMP June 2002 1055 Capability Code is a one octet field that unambiguously identifies 1056 individual capabilities. 1058 Capability Length: 1060 Capability Length is a one octet field that contains the length of 1061 the Capability Value field in octets. 1063 Capability Value: 1065 Capability Value is a variable length field that is interpreted 1066 according to the value of the Capability Code field. 1068 A particular capability, as identified by its Capability Code, may 1069 occur more than once within the Optional Parameter. 1071 This document reserves Capability Codes 128-255 for vendor-specific 1072 applications. 1074 This document reserves value 0. 1076 Capability Codes (other than those reserved for vendor specific use) 1077 are assigned only by the IETF consensus process and IESG approval. 1079 7.3. UPDATE Message Format 1081 UPDATE messages are used to transfer Join/Prune/FwdrPref information 1082 between BGMP peers. The UPDATE message always includes the fixed- 1083 size BGMP header, and one or more attributes as described below. 1085 The message format below allows compact encoding of (*,G) Joins and 1086 Prunes, while allowing the flexibility needed to do other updates 1087 such as (S,G) Joins and Prunes towards sources as well as on the 1088 shared tree. In the discussion below, an Encoded-Address-Prefix is 1089 of the form: 1090 0 1 2 3 1091 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1092 +-+-+-+-+-+-+-+-+ 1093 |EnTyp| AddrFam | 1094 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1095 | Address (variable length) | 1096 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1097 | Mask (variable length) | 1099 Draft BGMP June 2002 1101 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1103 EnTyp: 1104 0 - All 1's Mask. The Mask field is 0 bytes long. 1105 1 - Mask length included. The Mask field is 4 bytes long, and 1106 contains the mask length, in bits. 1107 2 - Full Mask included. The Mask field is the same length 1108 as the Address field, and contains the full bitmask. 1110 AddrFam: 1111 The IANA-assigned address family number of the encoded prefix. 1113 Address: 1114 The address associated with the given prefix to be encoded. The 1115 length is determined based on the Address Family. 1117 Mask: 1118 The mask associated with the given prefix. The format (or absence) 1119 of this field is determined by the EnTyp field. 1121 Each attribute is of the form: 1123 0 1 2 3 1124 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1125 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1126 | Length | Type | Data ... 1127 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1128 All attributes are 4-byte aligned. 1130 Length: 1131 The Length is the length of the entire attribute, including the 1132 length, type, and data fields. If other attributes are nested 1133 within the data field, the length includes the size of all such 1134 nested attributes. 1136 Type: 1138 Types 128-255 are reserved for "optional" attributes. If a 1139 required attribute is unrecognized, a NOTIFICATION will be sent and 1140 the connection will be closed if the error is a fatal one. 1141 Unrecognized optional attributes are simply ignored. 1143 0 - JOIN 1145 Draft BGMP June 2002 1147 1 - PRUNE 1148 2 - GROUP 1149 3 - SOURCE 1150 4 - FWDR_PREF 1151 5 - POISON_REVERSE 1153 a) JOIN (Type Code 0) 1155 The JOIN attribute indicates that all GROUP or SOURCE options 1156 nested immediately within the JOIN option should be joined. 1158 0 1 2 3 1159 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1160 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1161 | Length | Type=0 | Reserved | 1162 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1163 | Nested Attributes ... 1164 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1165 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1166 within a JOIN attribute. 1168 b) PRUNE (Type Code 1) 1170 The PRUNE attribute indicates that all GROUP or SOURCE attributes 1171 nested immediately within the PRUNE attribute should be pruned. 1173 0 1 2 3 1174 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1175 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1176 | Length | Type=1 | Reserved | 1177 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1178 | Nested Attributes ... 1179 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1180 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1181 within a PRUNE attribute. 1183 c) GROUP (Type Code 2) 1185 The GROUP attribute identifies a given group-prefix. In addition, 1186 any attributes nested immediately within the GROUP attribute also 1187 apply to the given group-prefix. 1189 0 1 2 3 1190 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1191 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1193 Draft BGMP June 2002 1195 | Length | Type=2 | | 1196 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1197 | | 1198 | Encoded-Address-Prefix | 1199 | | 1200 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1201 | Nested Attributes (optional) ... 1202 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1203 Encoded-Address-Prefix 1204 The multicast group prefix to be joined to 1205 pruned, 1206 in the format described above. 1207 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1208 be 1209 immediately nested within a GROUP attribute. 1211 d) SOURCE (Type Code 3): 1213 The SOURCE attribute identifies a given source-prefix. In 1214 addition, any attributes nested immediately within the SOURCE 1215 attribute also apply to the given source-prefix. 1217 The SOURCE attribute has the following format: 1219 0 1 2 3 1220 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1221 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1222 | Length | Type=2 | | 1223 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1224 | | 1225 | Encoded-Address-Prefix | 1226 | | 1227 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1228 | Nested Attributes (optional) ... 1229 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1230 Encoded-Address-Prefix 1231 The Source-prefix in the format described 1232 above. 1233 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1234 be 1235 immediately nested within a SOURCE attribute. 1237 e) FWDR_PREF (Type Code 4) 1239 The FWDR_PREF attribute provides a forwarder preference value for 1241 Draft BGMP June 2002 1243 all GROUP or SOURCE attributes nested immediately within the 1244 FWDR_PREF attribute. It is used by a BGMP speaker to inform other 1245 BGMP speakers of the originating speaker's degree of preference for 1246 a given group or source prefix. Usage of this attribute is 1247 described in 5.5. 1249 0 1 2 3 1250 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1251 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1252 | Length | Type=1 | Reserved | 1253 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1254 | Preference Value | 1255 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1256 | Nested Attributes ... 1257 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1258 Preference Value A 32-bit non-negative integer. 1259 Nested Attributes No JOIN, PRUNE, or FWDR_PREF attributes may be 1260 immediately nested within a FWDR_PREF 1261 attribute. 1263 e) POISON_REVERSE (Type Code 5) 1265 The POISON_REVERSE attribute provides a "poison-reverse" (PR-bit) 1266 value for all SOURCE attributes nested immediately within the 1267 POISON_REVERSE attribute. It is used by a BGMP speaker to inform 1268 other BGMP speakers from which it has received (S,G) Joins that 1269 they are on the path of domains between the source and the root 1270 domain. 1272 0 1 2 3 1273 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1275 | Length | Type=1 | Reserved |P| 1276 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1277 | Nested Attributes ... 1278 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1279 P The PR-bit value. 1280 Nested Attributes No attribues in the document other than SOURCE 1281 may be immediately nested within a 1282 POISON_REVERSE 1283 attribute. 1285 Draft BGMP June 2002 1287 7.4. Encoding examples 1289 Below are enumerated examples of how various updates are built using 1290 nested attributes, where A ( B ) denotes that attribute B is nested 1291 within attribute A. 1292 (*,G-prefix) Join: JOIN ( GROUP ) 1293 (*,G-prefix) Prune: PRUNE ( GROUP ) 1294 (S,G) Join towards S : GROUP ( JOIN ( SOURCE ) ) 1295 (S,G) Join cancelling prune towards root of G: GROUP ( JOIN ( SOURCE ) ) 1296 (S,G) Prune towards S: GROUP ( PRUNE ( SOURCE ) ) 1297 (S,G) Prune towards root of G: GROUP ( PRUNE ( SOURCE ) ) 1298 Switch from (*,G) to (S,G): PRUNE ( GROUP ( JOIN ( SOURCE ) ) ) 1299 Switch from (S,G) to (*,G): JOIN ( GROUP ) 1300 Initial (*,G) Join with S pruned: JOIN ( GROUP ( PRUNE ( SOURCE ) ) ) 1301 Forwarder preference announcement for G-prefix: FWDR_PREF ( GROUP ) 1302 Forwarder preference announcement for S-prefix: FWDR_PREF ( SOURCE ) 1304 7.5. KEEPALIVE Message Format 1306 BGMP does not use any transport protocol-based keep-alive mechanism 1307 to determine if peers are reachable. Instead, KEEPALIVE messages are 1308 exchanged between peers often enough as not to cause the Hold Timer 1309 to expire. A reasonable maximum time between the last KEEPALIVE or 1310 UPDATE message sent, and the time at which a KEEPALIVE message is 1311 sent, would be one third of the Hold Time interval. KEEPALIVE 1312 messages MUST NOT be sent more frequently than one per second. An 1313 implementation MAY adjust the rate at which it sends KEEPALIVE 1314 messages as a function of the Hold Time interval. 1316 If the negotiated Hold Time interval is zero, then periodic KEEPALIVE 1317 messages MUST NOT be sent. 1319 A KEEPALIVE message consists of only a message header, and has a 1320 length of 4 octets. 1322 7.6. NOTIFICATION Message Format 1324 A NOTIFICATION message is sent when an error condition is detected. 1325 The BGMP connection is closed immediately after sending it if the 1326 error is a fatal one. 1328 In addition to the fixed-size BGMP header, the NOTIFICATION message 1329 contains the following fields: 1331 Draft BGMP June 2002 1333 0 1 2 3 1334 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1335 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1336 |O| Error code | Error subcode | Data | 1337 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1338 | | 1339 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1341 O-bit: 1342 Open-bit. If clear, the connection will be closed. 1343 If set, indicates the error is not fatal. 1345 Error Code: 1347 This 1-octet unsigned integer indicates the type of 1348 NOTIFICATION. The following Error Codes have been defined: 1350 Error Code Symbolic Name Reference 1352 1 Message Header Error Section 9.1 1354 2 OPEN Message Error Section 9.2 1356 3 UPDATE Message Error Section 9.3 1358 4 Hold Timer Expired Section 9.5 1360 5 Finite State Machine Error Section 9.6 1362 6 Cease Section 9.7 1364 Error subcode: 1366 This 1-octet unsigned integer provides more specific 1367 information about the nature of the reported error. Each 1368 Error 1369 Code may have one or more Error Subcodes associated with it. 1370 If no appropriate Error Subcode is defined, then a zero 1371 (Unspecific) value is used for the Error Subcode field. 1372 The notation (MC) below indicates the error is a fatal one 1373 and the O-bit must be clear. Non-fatal subcodes SHOULD 1374 be sent with the O-bit set. 1376 Message Header Error subcodes: 1378 Draft BGMP June 2002 1380 2 - Bad Message Length 1381 (MC) 1382 3 - Bad Message Type 1383 (MC) 1385 OPEN Message Error subcodes: 1387 1 - Unsupported Version 1388 (MC) 1389 4 - Unsupported Optional Parameter 1390 5 - Authentication Failure 1391 (MC) 1392 6 - Unacceptable Hold Time 1393 (MC) 1394 7 - Unsupported Capability 1395 (MC) 1397 UPDATE Message Error subcodes: 1399 1 - Malformed Attribute List 1400 (MC) 1401 2 - Unrecognized Attribute Type 1402 5 - Attribute Length Error 1403 (MC) 1404 10 - Invalid Address 1405 11 - Invalid Mask 1406 13 - Unrecognized Address Family 1407 Data: 1408 This variable-length field is used to diagnose the reason for the 1409 NOTIFICATION. The contents of the Data field depend upon the 1410 Error Code and Error Subcode. See Section 9 below for more 1411 details. 1413 Note that the length of the Data field can be determined from the 1414 message Length field by the formula: 1416 Message Length = 6 + Data Length 1418 The minimum length of the NOTIFICATION message is 6 octets 1419 (including message header). 1421 Draft BGMP June 2002 1423 8. BGMP Error Handling 1425 This section describes actions to be taken when errors are detected 1426 while processing BGMP messages. BGMP Error Handling is similar to 1427 that of BGP [BGP]. 1429 When any of the conditions described here are detected, a 1430 NOTIFICATION message with the indicated Error Code, Error Subcode, 1431 and Data fields is sent, and the BGMP connection is closed if the 1432 error is a fatal one. If no Error Subcode is specified, then a zero 1433 must be used. 1435 The phrase "the BGMP connection is closed" means that the transport 1436 protocol connection has been closed and that all resources for that 1437 BGMP connection have been deallocated. The remote peer is removed 1438 from the target list of all tree state entries. 1440 Unless specified explicitly, the Data field of the NOTIFICATION 1441 message that is sent to indicate an error is empty. 1443 8.1. Message Header error handling 1445 All errors detected while processing the Message Header are indicated 1446 by sending the NOTIFICATION message with Error Code Message Header 1447 Error. The Error Subcode elaborates on the specific nature of the 1448 error. 1450 If the Length field of the message header is less than 4 or greater 1451 than 4096, or if the Length field of an OPEN message is less than 1452 the minimum length of the OPEN message, or if the Length field of an 1453 UPDATE message is less than the minimum length of the UPDATE message, 1454 or if the Length field of a KEEPALIVE message is not equal to 4, then 1455 the Error Subcode is set to Bad Message Length. The Data field 1456 contains the erroneous Length field. 1458 If the Type field of the message header is not recognized, then the 1459 Error Subcode is set to Bad Message Type. The Data field contains 1460 the erroneous Type field. 1462 8.2. OPEN message error handling 1464 All errors detected while processing the OPEN message are indicated 1465 by sending the NOTIFICATION message with Error Code OPEN Message 1467 Draft BGMP June 2002 1469 Error. The Error Subcode elaborates on the specific nature of the 1470 error. 1472 If the version number contained in the Version field of the received 1473 OPEN message is not supported, then the Error Subcode is set to 1474 Unsupported Version Number. The Data field is a 2-octet unsigned 1475 integer, which indicates the largest locally supported version number 1476 less than the version the remote BGMP peer bid (as indicated in the 1477 received OPEN message). 1479 If the Hold Time field of the OPEN message is unacceptable, then the 1480 Error Subcode MUST be set to Unacceptable Hold Time. An 1481 implementation MUST reject Hold Time values of one or two seconds. 1482 An implementation MAY reject any proposed Hold Time. An 1483 implementation which accepts a Hold Time MUST use the negotiated 1484 value for the Hold Time. 1486 If one of the Optional Parameters in the OPEN message is not 1487 recognized, then the Error Subcode is set to Unsupported Optional 1488 Parameters. 1490 If the OPEN message carries Authentication Information (as an 1491 Optional Parameter), then the corresponding authentication procedure 1492 is invoked. If the authentication procedure (based on Authentication 1493 Code and Authentication Data) fails, then the Error Subcode is set to 1494 Authentication Failure. 1496 If the OPEN message indicates that the peer does not support a 1497 capability which the receiver requires, the receiver may send a 1498 NOTIFICATION message to the peer, and terminate peering. The Error 1499 Subcode in the message is set to Unsupported Capability. The Data 1500 field in the NOTIFICATION message lists the set of capabilities that 1501 cause the speaker to send the message. Each such capability is 1502 encoded the same way as it was encoded in the received OPEN message. 1504 8.3. UPDATE message error handling 1506 All errors detected while processing the UPDATE message are indicated 1507 by sending the NOTIFICATION message with Error Code UPDATE Message 1508 Error. The error subcode elaborates on the specific nature of the 1509 error. 1511 Draft BGMP June 2002 1513 If any recognized attribute has Attribute Length that conflicts with 1514 the expected length (based on the attribute type code), then the 1515 Error Subcode is set to Attribute Length Error. The Data field 1516 contains the erroneous attribute (type, length and value). 1518 If the Encoded-Address-Prefix field in some attribute is 1519 syntactically incorrect, then the Error Subcode is set to Invalid 1520 Prefix Field. 1522 If any other is encountered when processing attributes (such as 1523 invalid nestings), then the Error Subcode is set to Malformed 1524 Attribute List, and the problematic attribute is included in the data 1525 field. 1527 8.4. NOTIFICATION message error handling 1529 If a peer sends a NOTIFICATION message, and there is an error in that 1530 message, there is unfortunately no means of reporting this error via 1531 a subsequent NOTIFICATION message. Any such error, such as an 1532 unrecognized Error Code or Error Subcode, should be noticed, logged 1533 locally, and brought to the attention of the administration of the 1534 peer. The means to do this, however, lies outside the scope of this 1535 document. 1537 8.5. Hold Timer Expired error handling 1539 If a system does not receive successive KEEPALIVE and/or UPDATE 1540 and/or NOTIFICATION messages within the period specified in the Hold 1541 Time field of the OPEN message, then the NOTIFICATION message with 1542 Hold Timer Expired Error Code must be sent and the BGMP connection 1543 closed. 1545 8.6. Finite State Machine error handling 1547 Any error detected by the BGMP Finite State Machine (e.g., receipt of 1548 an unexpected event) is indicated by sending the NOTIFICATION message 1549 with Error Code Finite State Machine Error. 1551 Draft BGMP June 2002 1553 8.7. Cease 1555 In absence of any fatal errors (that are indicated in this section), 1556 a BGMP peer may choose at any given time to close its BGMP connection 1557 by sending the NOTIFICATION message with Error Code Cease. However, 1558 the Cease NOTIFICATION message must not be used when a fatal error 1559 indicated by this section does exist. 1561 8.8. Connection collision detection 1563 If a pair of BGMP speakers try simultaneously to establish a TCP 1564 connection to each other, then two parallel connections between this 1565 pair of speakers might well be formed. We refer to this situation as 1566 connection collision. Clearly, one of these connections must be 1567 closed. 1569 Based on the value of the BGMP Identifier a convention is established 1570 for detecting which BGMP connection is to be preserved when a 1571 collision does occur. The convention is to compare the BGMP 1572 Identifiers of the peers involved in the collision and to retain only 1573 the connection initiated by the BGMP speaker with the higher-valued 1574 BGMP Identifier. 1576 Upon receipt of an OPEN message, the local system must examine all of 1577 its connections that are in the OpenConfirm state. A BGMP speaker 1578 may also examine connections in an OpenSent state if it knows the 1579 BGMP Identifier of the peer by means outside of the protocol. If 1580 among these connections there is a connection to a remote BGMP 1581 speaker whose BGMP Identifier equals the one in the OPEN message, 1582 then the local system performs the following collision resolution 1583 procedure: 1585 1. The BGMP Identifier of the local system is compared to the BGMP 1586 Identifier of the remote system (as specified in the OPEN message). 1588 2. If the value of the local BGMP Identifier is less than the remote 1589 one, the local system closes BGMP connection that already exists (the 1590 one that is already in the OpenConfirm state), and accepts BGMP 1591 connection initiated by the remote system. 1593 3. Otherwise, the local system closes newly created BGMP connection 1594 (the one associated with the newly received OPEN message), and 1595 continues to use the existing one (the one that is already in the 1596 OpenConfirm state). 1598 Draft BGMP June 2002 1600 Comparing BGMP Identifiers is done by treating them as (4-octet long) 1601 unsigned integers. 1603 A connection collision with an existing BGMP connection that is in 1604 Established states causes unconditional closing of the newly created 1605 connection. Note that a connection collision cannot be detected with 1606 connections that are in Idle, or Connect, or Active states. 1608 Closing the BGMP connection (that results from the collision 1609 resolution procedure) is accomplished by sending the NOTIFICATION 1610 message with the Error Code Cease. 1612 9. BGMP Version Negotiation 1614 BGMP speakers may negotiate the version of the protocol by making 1615 multiple attempts to open a BGMP connection, starting with the 1616 highest version number each supports. If an open attempt fails with 1617 an Error Code OPEN Message Error, and an Error Subcode Unsupported 1618 Version Number, then the BGMP speaker has available the version 1619 number it tried, the version number its peer tried, the version 1620 number passed by its peer in the NOTIFICATION message, and the 1621 version numbers that it supports. If the two peers do support one or 1622 more common versions, then this will allow them to rapidly determine 1623 the highest common version. In order to support BGMP version 1624 negotiation, future versions of BGMP must retain the format of the 1625 OPEN and NOTIFICATION messages. 1627 9.1. BGMP Capability Negotiation 1629 When a BGMP speaker sends an OPEN message to its BGMP peer, the 1630 message may include an Optional Parameter, called Capabilities. The 1631 parameter lists the capabilities supported by the speaker. 1633 A BGMP speaker may use a particular capability when peering with 1634 another speaker only if both speakers support that capability. A 1635 BGMP speaker determines the capabilities supported by its peer by 1636 examining the list of capabilities present in the Capabilities 1637 Optional Parameter carried by the OPEN message that the speaker 1638 receives from the peer. 1640 Draft BGMP June 2002 1642 10. BGMP Finite State machine 1644 This section specifies BGMP operation in terms of a Finite State 1645 Machine (FSM). Following is a brief summary and overview of BGMP 1646 operations by state as determined by this FSM. 1648 Initially BGMP is in the Idle state. 1650 Idle state: 1652 In this state BGMP refuses all incoming BGMP connections. No 1653 resources are allocated to the peer. In response to the Start 1654 event (initiated by either system or operator) the local system 1655 initializes all BGMP resources, starts the ConnectRetry timer, 1656 initiates a transport connection to the other BGMP peer, while 1657 listening for a connection that may be initiated by the remote 1658 BGMP peer, and changes its state to Connect. The exact value of 1659 the ConnectRetry timer is a local matter, but should be 1660 sufficiently large to allow TCP initialization. 1662 If a BGMP speaker detects an error, it shuts down the connection 1663 and changes its state to Idle. Getting out of the Idle state 1664 requires generation of the Start event. If such an event is 1665 generated automatically, then persistent BGMP errors may result in 1666 persistent flapping of the speaker. To avoid such a condition it 1667 is recommended that Start events should not be generated 1668 immediately for a peer that was previously transitioned to Idle 1669 due to an error. For a peer that was previously transitioned to 1670 Idle due to an error, the time between consecutive generation of 1671 Start events, if such events are generated automatically, shall 1672 exponentially increase. The value of the initial timer shall be 60 1673 seconds. The time shall be doubled for each consecutive retry. 1675 Any other event received in the Idle state is ignored. 1677 Connect state: 1679 In this state BGMP is waiting for the transport protocol 1680 connection to be completed. 1682 If the transport protocol connection succeeds, the local system 1683 clears the ConnectRetry timer, completes initialization, sends an 1684 OPEN message to its peer, and changes its state to OpenSent. If 1685 the transport protocol connect fails (e.g., retransmission 1686 timeout), the local system restarts the ConnectRetry timer, 1688 Draft BGMP June 2002 1690 continues to listen for a connection that may be initiated by the 1691 remote BGMP peer, and changes its state to Active state. 1693 In response to the ConnectRetry timer expired event, the local 1694 system restarts the ConnectRetry timer, initiates a transport 1695 connection to the other BGMP peer, continues to listen for a 1696 connection that may be initiated by the remote BGMP peer, and 1697 stays in the Connect state. 1699 The Start event is ignored in the Connect state. 1701 In response to any other event (initiated by either system or 1702 operator), the local system releases all BGMP resources associated 1703 with this connection and changes its state to Idle. 1705 Active state: 1707 In this state BGMP is trying to acquire a peer by listening for an 1708 incoming transport protocol connection. 1710 If the transport protocol connection succeeds, the local system 1711 clears the ConnectRetry timer, completes initialization, sends an 1712 OPEN message to its peer, sets its Hold Timer to a large value, 1713 and changes its state to OpenSent. A Hold Timer value of 4 1714 minutes is suggested. 1716 In response to the ConnectRetry timer expired event, the local 1717 system restarts the ConnectRetry timer, initiates a transport 1718 connection to other BGMP peer, continues to listen for a 1719 connection that may be initiated by the remote BGMP peer, and 1720 changes its state to Connect. 1722 If the local system detects that a remote peer is trying to 1723 establish BGMP connection to it, and the IP address of the remote 1724 peer is not an expected one, the local system restarts the 1725 ConnectRetry timer, rejects the attempted connection, continues to 1726 listen for a connection that may be initiated by the remote BGMP 1727 peer, and stays in the Active state. 1729 The Start event is ignored in the Active state. 1731 In response to any other event (initiated by either system or 1732 operator), the local system releases all BGMP resources associated 1733 with this connection and changes its state to Idle. 1735 Draft BGMP June 2002 1737 OpenSent state: 1739 In this state BGMP waits for an OPEN message from its peer. When 1740 an OPEN message is received, all fields are checked for 1741 correctness. If the BGMP message header checking or OPEN message 1742 checking detects an error (see Section 6.2), or a connection 1743 collision (see Section 6.8) the local system sends a NOTIFICATION 1744 message and changes its state to Idle. 1746 If there are no errors in the OPEN message, BGMP sends a KEEPALIVE 1747 message and sets a KeepAlive timer. The Hold Timer, which was 1748 originally set to a large value (see above), is replaced with the 1749 negotiated Hold Time value (see section 4.2). If the negotiated 1750 Hold Time value is zero, then the Hold Time timer and KeepAlive 1751 timers are not started. If the value of the Autonomous System 1752 field is the same as the local Autonomous System number, then the 1753 connection is an "internal" connection; otherwise, it is 1754 "external". Finally, the state is changed to OpenConfirm. 1756 If a disconnect notification is received from the underlying 1757 transport protocol, the local system closes the BGMP connection, 1758 restarts the ConnectRetry timer, while continue listening for 1759 connection that may be initiated by the remote BGMP peer, and goes 1760 into the Active state. 1762 If the Hold Timer expires, the local system sends NOTIFICATION 1763 message with error code Hold Timer Expired and changes its state 1764 to Idle. 1766 In response to the Stop event (initiated by either system or 1767 operator) the local system sends NOTIFICATION message with Error 1768 Code Cease and changes its state to Idle. 1770 The Start event is ignored in the OpenSent state. 1772 In response to any other event the local system sends NOTIFICATION 1773 message with Error Code Finite State Machine Error and changes its 1774 state to Idle. 1776 Whenever BGMP changes its state from OpenSent to Idle, it closes 1777 the BGMP (and transport-level) connection and releases all 1778 resources associated with that connection. 1780 OpenConfirm state: 1782 Draft BGMP June 2002 1784 In this state BGMP waits for a KEEPALIVE or NOTIFICATION message. 1786 If the local system receives a KEEPALIVE message, it changes its 1787 state to Established. 1789 If the Hold Timer expires before a KEEPALIVE message is received, 1790 the local system sends NOTIFICATION message with error code Hold 1791 Timer Expired and changes its state to Idle. 1793 If the local system receives a NOTIFICATION message, it changes 1794 its state to Idle. 1796 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1797 message and restarts its KeepAlive timer. 1799 If a disconnect notification is received from the underlying 1800 transport protocol, the local system changes its state to Idle. 1802 In response to the Stop event (initiated by either system or 1803 operator) the local system sends NOTIFICATION message with Error 1804 Code Cease and changes its state to Idle. 1806 The Start event is ignored in the OpenConfirm state. 1808 In response to any other event the local system sends NOTIFICATION 1809 message with Error Code Finite State Machine Error and changes its 1810 state to Idle. 1812 Whenever BGMP changes its state from OpenConfirm to Idle, it 1813 closes the BGMP (and transport-level) connection and releases all 1814 resources associated with that connection. 1816 Established state: 1818 In the Established state BGMP can exchange UPDATE, NOTIFICATION, 1819 and KEEPALIVE messages with its peer. 1821 If the local system receives an UPDATE or KEEPALIVE message, it 1822 restarts its Hold Timer, if the negotiated Hold Time value is non- 1823 zero. 1825 If the local system receives a NOTIFICATION message, it changes 1826 its state to Idle. 1828 If the local system receives an UPDATE message and the UPDATE 1830 Draft BGMP June 2002 1832 message error handling procedure (see Section 6.3) detects an 1833 error, the local system sends a NOTIFICATION message and changes 1834 its state to Idle. 1836 If a disconnect notification is received from the underlying 1837 transport protocol, the local system changes its state to Idle. 1839 If the Hold Timer expires, the local system sends a NOTIFICATION 1840 message with Error Code Hold Timer Expired and changes its state 1841 to Idle. 1843 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1844 message and restarts its KeepAlive timer. 1846 Each time the local system sends a KEEPALIVE or UPDATE message, it 1847 restarts its KeepAlive timer, unless the negotiated Hold Time 1848 value is zero. 1850 In response to the Stop event (initiated by either system or 1851 operator), the local system sends a NOTIFICATION message with 1852 Error Code Cease and changes its state to Idle. 1854 The Start event is ignored in the Established state. 1856 In response to any other event, the local system sends 1857 NOTIFICATION message with Error Code Finite State Machine Error 1858 and changes its state to Idle. 1860 Whenever BGMP changes its state from Established to Idle, it 1861 closes the BGMP (and transport-level) connection, releases all 1862 resources associated with that connection, and deletes all routes 1863 derived from that connection. 1865 11. Security Considerations 1867 BGMP uses TCP sessions for all network communication between peers. TCP 1868 sessions may be secured through the use of IPsec [IPSEC]. 1870 12. Authors' Addresses 1872 Dave Thaler 1873 Microsoft 1875 Draft BGMP June 2002 1877 One Microsoft Way 1878 Redmond, WA 98052 1879 EMail: dthaler@microsoft.com 1881 13. Normative References 1883 [INTEROP] 1884 Thaler, D., "Interoperability Rules for Multicast Routing 1885 Protocols", RFC 2715, October 1999. 1887 [IPSEC] 1888 Kent, S., and R. Atkinson, "Security Architecture for the Internet 1889 Protocol", RFC 2401, November 1998. 1891 [RFC2119] 1892 S. Bradner, "Key words for use in RFCs to Indicate Requirement 1893 Levels", BCP 14, RFC 2119, March 1997. 1895 [V4PREFIX] 1896 D. Thaler, "Unicast-Prefix-based IPv4 Multicast Addresses", draft- 1897 thaler-ipv4-uni-based-mcast-00.txt, Work in progress, November 1898 2001. 1900 [V6PREFIX] 1901 Haberman, B., and D. Thaler, "Unicast-Prefix-based IPv6 Multicast 1902 Addresses", draft-ietf-ipngwg-uni-based-mcast-03.txt, Work in 1903 progress, October 2001. 1905 14. Non-normative References 1907 [BGP] 1908 Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1909 1771, March 1995. 1911 [MBGP] 1912 Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol 1913 Extensions for BGP-4", RFC 2283, February 1998. 1915 [CBT] 1916 Ballardie, A., "Core Based Trees (CBT version 2) Multicast 1917 Routing", RFC 2189, September 1997. 1919 Draft BGMP June 2002 1921 [DVMRP] 1922 Pusateri, T., "Distance Vector Multicast Routing Protocol", draft- 1923 ietf-idmr-dvmrp-v3-10.txt, Work in progress, August 2000. 1925 [DWR] 1926 Fenner, W., "Domain-Wide Reports", draft-ietf-idmr-membership- 1927 reports-04.txt, Work in progress, August 1999. 1929 [IPv6AA] 1930 Hinden, R., and S. Deering, "IP Version 6 Addressing Architecture", 1931 RFC 2373, July 1998. 1933 [MOSPF] 1934 Moy, J., "Multicast Extensions to OSPF", RFC 1584, Proteon, March 1935 1994. 1937 [PIMDM] 1938 Adams, A., Nicholas, J., and W. Siadak, "Protocol Independent 1939 Multicast - Dense Mode (PIM-DM): Protocol Specification (Revised)", 1940 draft-ietf-pim-dm-new-v2-01.txt, Work in progress, February 2002. 1942 [PIMSM] 1943 Estrin, et al., "Protocol Independent Multicast-Sparse Mode (PIM- 1944 SM): Protocol Specification", RFC 2362, June 1998. 1946 [REFLECT] 1947 Bates, T., and R. Chandra, "BGP Route Reflection: An alternative to 1948 full mesh IBGP", RFC 1966, June 1996. 1950 15. Full Copyright Statement 1952 Copyright (C) The Internet Society (2002). All Rights Reserved. 1954 This document and translations of it may be copied and furnished to 1955 others, and derivative works that comment on or otherwise explain it or 1956 assist in its implementation may be prepared, copied, published and 1957 distributed, in whole or in part, without restriction of any kind, 1958 provided that the above copyright notice and this paragraph are included 1959 on all such copies and derivative works. However, this document itself 1960 may not be modified in any way, such as by removing the copyright notice 1961 or references to the Internet Society or other Internet organizations, 1962 except as needed for the purpose of developing Internet standards in 1963 which case the procedures for copyrights defined in the Internet 1964 Draft BGMP June 2002 1966 languages other than English. 1968 The limited permissions granted above are perpetual and will not be 1969 revoked by the Internet Society or its successors or assigns. 1971 This document and the information contained herein is provided on an "AS 1972 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 1973 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 1974 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 1975 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 1976 FITNESS FOR A PARTICULAR PURPOSE. 1978 Table of Contents 1980 xp t