idnits 2.17.1 draft-ietf-bgmp-spec-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 2 longer pages, the longest (page 19) being 69 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 3 instances of too long lines in the document, the longest one being 5 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 458 has weird spacing: '... target list ...' == Line 1454 has weird spacing: '...is less than...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 2003) is 7498 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BR32' is mentioned on line 248, but not defined == Missing Reference: 'BR41' is mentioned on line 248, but not defined == Missing Reference: 'BR31' is mentioned on line 250, but not defined == Missing Reference: 'BR42' is mentioned on line 250, but not defined == Missing Reference: 'BR43' is mentioned on line 250, but not defined == Missing Reference: 'BR22' is mentioned on line 252, but not defined == Missing Reference: 'BR52' is mentioned on line 252, but not defined == Missing Reference: 'BR53' is mentioned on line 252, but not defined == Missing Reference: 'BR21' is mentioned on line 254, but not defined == Missing Reference: 'BR51' is mentioned on line 254, but not defined == Missing Reference: 'BR12' is mentioned on line 256, but not defined == Missing Reference: 'BR61' is mentioned on line 256, but not defined == Missing Reference: 'BR13' is mentioned on line 258, but not defined == Missing Reference: 'BR71' is mentioned on line 262, but not defined == Missing Reference: 'BR81' is mentioned on line 262, but not defined == Missing Reference: 'BRXY' is mentioned on line 266, but not defined == Missing Reference: 'PIM-SM' is mentioned on line 289, but not defined == Unused Reference: 'DVMRP' is defined on line 1923, but no explicit reference was found in the text == Unused Reference: 'MOSPF' is defined on line 1935, but no explicit reference was found in the text == Unused Reference: 'PIMDM' is defined on line 1939, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 2715 (ref. 'INTEROP') ** Obsolete normative reference: RFC 2401 (ref. 'IPSEC') (Obsoleted by RFC 4301) == Outdated reference: A later version (-06) exists of draft-ietf-mboned-ipv4-uni-based-mcast-00 -- Obsolete informational reference (is this intentional?): RFC 1771 (ref. 'BGP') (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 2858 (ref. 'MBGP') (Obsoleted by RFC 4760) == Outdated reference: A later version (-11) exists of draft-ietf-idmr-dvmrp-v3-10 == Outdated reference: A later version (-05) exists of draft-ietf-idmr-membership-reports-04 -- Obsolete informational reference (is this intentional?): RFC 3513 (ref. 'IPv6AA') (Obsoleted by RFC 4291) == Outdated reference: A later version (-05) exists of draft-ietf-pim-dm-new-v2-01 == Outdated reference: A later version (-12) exists of draft-ietf-pim-sm-v2-new-07 -- Obsolete informational reference (is this intentional?): RFC 1966 (ref. 'REFLECT') (Obsoleted by RFC 4456) Summary: 5 errors (**), 0 flaws (~~), 31 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BGMP Working Group D. Thaler 3 Internet Engineering Task Force Microsoft 4 INTERNET-DRAFT May 2003 5 Expires October 2003 7 Border Gateway Multicast Protocol (BGMP): 8 Protocol Specification 9 11 Status of this Memo 13 This document is an Internet-Draft and is in full conformance with all 14 provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering Task 17 Force (IETF), its areas, and its working groups. Note that other groups 18 may also distribute working documents as Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference material 23 or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/1id-abstracts.html 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html 31 Abstract 33 This document describes BGMP, a protocol for inter-domain multicast 34 routing. BGMP builds shared trees for active multicast groups, and 35 optionally allows receiver domains to build source-specific, inter- 36 domain, distribution branches where needed. BGMP natively supports 37 Draft BGMP May 2003 39 "source-specific multicast" (SSM). To also support "any-source 40 multicast" (ASM), BGMP requires that each multicast group be associated 41 with a single root (in BGMP it is referred to as the root domain). It 42 requires that different ranges of the class D space are associated 43 (e.g., with Unicast-Prefix-Based Multicast addressing) with different 44 domains. Each of these domains then becomes the root of the shared 45 domain-trees for all groups in its range. Multicast participants will 46 generally receive better multicast service if the session initiator's 47 address allocator selects addresses from its own domain's part of the 48 space, thereby causing the root domain to be local to at least one of 49 the session participants. 51 Copyright Notice 53 Copyright (C) The Internet Society (2002). All Rights Reserved. 55 1. Acknowledgements 57 In addition to the editors, the following individuals have 58 contributed to the design of BGMP: Cengiz Alaettinoglu, Tony 59 Ballardie, Steve Casner, Steve Deering, Deborah Estrin, Dino 60 Farinacci, Bill Fenner, Mark Handley, Ahmed Helmy, Van Jacobson, Dave 61 Meyer, and Satish Kumar. 63 This document is the product of the IETF BGMP Working Group with Dave 64 Thaler as editor. 66 Rusty Eddy, Isidor Kouvelas, and Pavlin Radoslavov also provided 67 valuable feedback on this document. 69 2. Purpose 71 It has been suggested that inter-domain "any-source" multicast is 72 better supported with a rendezvous mechanism whereby members receive 73 source's data packets without any sort of global broadcast (e.g., 74 MSDP broadcasts source information, PIM-DM and DVMRP broadcast 75 initial data packets, and MOSPF broadcasts membership information). 76 PIM-SM [PIMSM] and CBT [CBT] use a shared group-tree, to which all 77 members join and thereby hear from all sources (and to which non- 78 members do not join and thereby hear from no sources). 80 This document describes BGMP, a protocol for inter-domain multicast 81 routing. BGMP natively supports "source-specific multicast" (SSM). 83 Draft BGMP May 2003 85 To also support "any-source multicast" (ASM), BGMP builds shared 86 trees for active multicast groups, and allows domains to build 87 source-specific, inter-domain, distribution branches where needed. 88 Building upon concepts from PIM-SM and CBT, BGMP requires that each 89 global multicast group be associated with a single root. However, in 90 BGMP, the root is an entire exchange or domain, rather than a single 91 router. 93 For non-source-specific groups, BGMP assumes that ranges of the multicast 94 address space have been associated (e.g., with Unicast-Prefix-Based 95 Multicast [V4PREFIX,V6PREFIX] addressing) with selected domains. 96 Each such domain then becomes the root of the shared domain-trees for 97 all groups in its range. An address allocator will generally achieve 98 better distribution trees if it takes its multicast addresses from 99 its own domain's part of the space, thereby causing the root domain 100 to be local. 102 BGMP uses TCP as its transport protocol. This eliminates the need to 103 implement message fragmentation, retransmission, acknowledgement, and 104 sequencing. BGMP uses TCP port 264 for establishing its connections. 105 This port is distinct from BGP's port to provide protocol 106 independence, and to facilitate distinguishing between protocol 107 packets (e.g., by packet classifiers, diagnostic utilities, etc.) 109 Two BGMP peers form a TCP connection between one another, and 110 exchange messages to open and confirm the connection parameters. 111 They then send incremental Join/Prune Updates as group memberships 112 change. BGMP does not require periodic refresh of individual 113 entries. KeepAlive messages are sent periodically to ensure the 114 liveness of the connection. Notification messages are sent in 115 response to errors or special conditions. If a connection encounters 116 an error condition, a notification message is sent and the connection 117 is closed if the error is a fatal one. 119 3. Revision History 121 29 May 2003 draft-01 123 (1) Removed all references to MASC and G-RIB. The current spec only 124 covers BGMP operation for source-specific groups, and any-source- 125 multicast using unicast prefix-based multicast addresses (for 126 both IPv4 and IPv6). No new routes of any type are needed in the 127 routing table. 129 Draft BGMP May 2003 131 (2) Removed section on transitioning away from using DVMRP as the 132 backbone to an AS-based multicast routing system with MBGP, as 133 this has already happened. 135 4. Terminology 137 This document uses the following technical terms: 139 Domain: 140 A set of one or more contiguous links and zero or more routers 141 surrounded by one or more multicast border routers. Note that this 142 loose definition of domain also applies to an external link between 143 two domains, as well as an exchange. 145 Root Domain: 146 When constructing a shared tree of domains for some group, one 147 domain will be the "root" of the tree. The root domain receives 148 data from each sender to the group, and functions as a rendezvous 149 domain toward which member domains can send inter-domain joins, and 150 to which sender domains can send data. 152 Multicast RIB: 153 The Routing Information Base, or routing table, used to calculate 154 the "next-hop" towards a particular address for multicast traffic. 156 Multicast IGP (M-IGP): 157 A generic term for any multicast routing protocol used for tree 158 construction within a domain. Typical examples of M-IGPs are: PIM- 159 SM, PIM-DM, DVMRP, MOSPF, and CBT. 161 EGP: A generic term for the interdomain unicast routing protocol in use. 162 Typically, this will be some version of BGP which can support a 163 Multicast RIB, such as MBGP [MBGP], containing both unicast and 164 multicast address prefixes. 166 Component: 167 The portion of a border router associated with (and logically 168 inside) a particular domain that runs the multicast IGP (M-IGP) for 169 that domain, if any. Each border router thus has zero or more 170 components inside routing domains. In addition, each border router 171 with external links that do not fall inside any routing domain will 172 have an inter-domain component that runs BGMP. 174 Draft BGMP May 2003 176 External peer: 177 A border router in another multicast AS (autonomous system, as used 178 in BGP), to which a BGMP TCP-connection is open. If BGP is being 179 used as the EGP, a separate "eBGP" TCP-connection will also be open 180 to the same peer. 182 Internal peer: 183 Another border router of the same multicast AS. If BGP is being 184 used as the EGP, the border router either speaks iBGP ("internal" 185 BGP) directly to internal peers in a full mesh, or indirectly 186 through a route reflector [REFLECT]. 188 Next-hop peer: 189 The next-hop peer towards a given IP address is the next EGP router 190 on the path to the given address, according to multicast RIB routes 191 in the EGP's routing table (e.g., in MBGP, routes whose Subsequent 192 Address Family Identifier field indicates that the route is valid 193 for multicast traffic). 195 target: 196 Either an EGP peer, or an M-IGP component. 198 Tree State Table: 199 This is a table of (S-prefix,G) and (*,G-prefix) entries that have 200 been explicitly joined by a set of targets. Each entry has, in 201 addition to the source and group addresses and masks, a list of 202 targets that have explicitly requested data (on behalf of directly 203 connected hosts or downstream routers). (S,G) entries also have an 204 "SPT" bit. 206 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in 207 this document are to be interpreted as described in [RFC2119]. 209 5. Protocol Overview 211 BGMP maintains group-prefix state in response to messages from BGMP 212 peers and notifications from M-IGP components. Group-shared trees 213 are rooted at the domain advertising the group prefix covering those 214 groups. When a receiver joins a specific group address, the border 215 router towards the root domain generates a group-specific Join 216 message, which is then forwarded Border-Router-by-Border-Router 217 towards the root domain (see Figure 1). BGMP Join and Prune messages 218 are sent over TCP connections between BGMP peers, and BGMP protocol 219 state is refreshed by KEEPALIVE messages periodically sent over TCP. 221 Draft BGMP May 2003 223 BGMP routers build group-specific bidirectional forwarding state as 224 they process the BGMP Join messages. Bidirectional forwarding state 225 means that packets received from any target are forwarded to all 226 other targets in the target list without any RPF checks. No group- 227 specific state or traffic exists in parts of the network where there 228 are no members of that group. 230 BGMP routers optionally build source-specific unidirectional 231 forwarding state, only where needed, to be compatible with source- 232 specific trees (SPTs) used by some M-IGPs (e.g., DVMRP, PIM-DM, or 233 PIM-SM), or to construct trees for source-specific groups. A domain 234 that uses an SPT-based M-IGP may need to inject multicast packets 235 from external sources via different border routers (to be compatible 236 with the M-IGP RPF checks) which thus act as "surrogates". For 237 example, in the Transit_1 domain, data from Src_A arrives at BR12, 238 but must be injected by BR11. A surrogate router may create a 239 source-specific BGMP branch if no shared tree state exists. Note: 240 stub domains with a single border router, such as Rcvr_Stub_7 in 241 Figure 1, receive all multicast data packets through that router, to 242 which all RPF checks point. Therefore, stub domains never build 243 source-specific state. 245 Root_Domain 246 [BR91]--------------------------\ 247 | | 248 [BR32] [BR41] 249 Transit_3 Transit_4 250 [BR31] [BR42] [BR43] 251 | | | 252 [BR22] [BR52] [BR53] 253 Transit_2 Transit_5 254 [BR21] [BR51] 255 | | 256 [BR12] [BR61] 257 Transit_1[BR11]----------[BR62]Stub_6 258 [BR13] (Src_A) 259 | (Rcvr_D) 260 ------------------- 261 | | 262 [BR71] [BR81] 263 Rcvr_Stub_7 Src_only_Stub_8 264 (Rcvr_C) (Src_B) 266 Figure 1: Example inter-domain topology. [BRXY] represents a BGMP 267 border 269 Draft BGMP May 2003 271 router. Transit_X is a transit domain network. *_Stub_X is a stub 272 domain network. 274 Data packets are forwarded based on a combination of BGMP and M-IGP 275 rules. The router forwards to a set of targets according to a 276 matching (S,G) BGMP tree state entry if it exists. If not found, the 277 router checks for a matching (*,G) BGMP tree state entry. If neither 278 is found, then the packet is sent natively to the next-hop EGP peer 279 for G, according to the Multicast RIB (for example, in the case of a 280 non-member sender such as Src_B in Figure 1). If a matching entry 281 was found, the packet is forwarded to all other targets in the target 282 list. In this way BGMP trees forward data in a bidirectional manner. 283 If a target is an M-IGP component then forwarding is subject to the 284 rules of that M-IGP protocol. 286 5.1. Design Rationale 288 Several other protocols, or protocol proposals, build shared trees 289 within domains [PIM-SM, CBT]. The design choices made for BGMP 290 result from our focus on Inter-Domain multicast in particular. The 291 design choices made by PIM-SM and CBT are better suited to the wide- 292 area intra-domain case. There are three major differences between 293 BGMP and other shared-tree protocols: 295 (1) Unidirectional vs. Bidirectional trees 297 Bidirectional trees (using bidirectional forwarding state as 298 described above) minimize third party dependence which is essential 299 in the inter-domain context. For example, in Figure 1, stub domains 300 7 and 8 would like to exchange multicast packets without being 301 dependent on the quality of connectivity of the root domain. 302 However, unidirectional shared trees (i.e., those using RPF checks) 303 have more aggressive loop prevention and share the same processing 304 rules as source-specific entries which are inherently unidirectional. 306 The lack of third party dependence concerns in the INTRA domain case 307 reduces the incentive to employ bidirectional trees. BGMP supports 308 bidirectional trees because it has to, and because it can without 309 excessive cost. 311 (2) Source-specific distribution trees/branches 313 In a departure from other shared tree protocols, source-specific BGMP 315 Draft BGMP May 2003 317 state is built ONLY where (a) it is needed to pull the multicast 318 traffic down to a BGMP router that has source-specific (S,G) state, 319 and (b) that router is NOT already on the shared tree (i.e., has no 320 (*,G) state), and (c) that router does not want to receive packets 321 via encapsulation from a router which is on the shared tree. BGMP 322 provides source-specific branches because most M-IGP protocols in use 323 today build source-specific trees. BGMP's source-specific branches 324 eliminate the unnecessary overhead of encapsulations for high data 325 rate sources from the shared tree's ingress router to the surrogate 326 injector (e.g. from BR12 to BR11 in Figure 1). Moreover, cases in 327 which shared paths are significantly longer than SPT paths will also 328 benefit. 330 However, except for source-specific group distribution trees, we do 331 not build source-specific inter-domain trees in general because (a) 332 inter-domain connectivity is generally less rich than intra-domain 333 connectivity, so shared distribution trees should have more 334 acceptible path length and traffic concentration properties in the 335 inter-domain context, than in the intra-domain case, and (b) by 336 having the shared tree state always take precedence over source- 337 specific tree state, we avoid ambiguities that can otherwise arise. 339 In summary, BGMP trees are, in a sense, a hybrid between PIM-SM and 340 CBT trees. 342 (3) Method of choosing root of group shared tree 344 The choice of a group's shared-tree-root has implications for 345 performance and policy. In the intra-domain case it is sometimes 346 assumed that all potential shared-tree roots (RPs/Cores) within the 347 domain are equally suited to be the root for a group that is 348 initiated within that domain. In the INTER-domain case, there is far 349 more opportunity for unacceptably poor locality, and administrative 350 control of a group's shared-tree root. Therefore in the intra-domain 351 case, other protocols sometimes treat all candidate roots (RPs or 352 Cores) as equivalent and emphasize load sharing and stability to 353 maximize performance. In the Inter-Domain case, all roots are not 354 equivalent, and we adopt an approach whereby a group's root domain is 355 not random but is subject to administrative control. 357 6. Protocol Details 359 In this section, we describe the detailed protocol that border 360 routers perform. We assume that each border router conforms to the 362 Draft BGMP May 2003 364 component-based model described in [INTEROP], modulo one correction 365 to section 3.2 ("BGMP" Dispatcher), as follows: 367 The iif owner of a (*,G) entry is the component owning the next-hop 368 interface towards the nominal root of G, in the multicast RIB. 370 6.1. Interaction with the EGP 372 The fundamental requirements imposed by BGMP are that: 374 (1) For a given source-specific group and source, BGMP must be able 375 to look up the next-hop towards the source in the Multicast RIB, 376 and 378 (2) For a given non-source-specific group, BGMP will map the group 379 address to a nominal "root" address, and must be able to look up 380 the next-hop towards that address in the Multicast RIB. 382 BGMP determines the nominal "root" address as follows. If the multicast 383 address is a Unicast-Prefix-based Multicast address (for either IPv4 or 384 IPv6), then the nominal root address is the embedded unicast prefix, 385 padded with a suffix of 0 bits to form a full address. 387 For example, if the IPv6 group address is 388 ff2e:0100:1234:5678:9abc:def0::123, then the unicast prefix is 389 1234:5678:9abc:def0/64, and the nominal root address would be 390 1234:5678:9abc:def0::. (This address is in fact the subnet routers 391 anycast address [IPv6AA].) 393 As an IPv4 example, if the IPv4 group address were 225.1.2.3, then the 394 nominal root address would be 1.2.3.0. 396 Support for any-source-multicast using any address other than a Unicast- 397 prefix-based Multicast Address is outside the scope of this document. 399 6.2. Multicast Data Packet Processing 401 For BGMP rules to be applied, an incoming packet must first be 402 "accepted": 404 o If the packet arrived on an interface owned by an M-IGP, the M-IGP 405 component determines whether the packet should be accepted or 406 dropped according to its rules. If the packet is accepted, the 408 Draft BGMP May 2003 410 packet is forwarded (or not forwarded) out any other interfaces 411 owned by the same component, as specified by the M-IGP. 413 o If the packet was received over a point-to-point interface owned 414 by BGMP, the packet is accepted. 416 o If the packet arrived on a multiaccess network interface owned by 417 BGMP, the packet is accepted if it is receiving data on a source- 418 specific branch, if it is the designated forwarder for the longest 419 matching route for S, or for the longest matching route for the 420 nominal root of G. 422 If the packet is accepted, then the router checks the tree state 423 table for a matching (S,G) entry. If one is found, but the packet 424 was not received from the next hop target towards S (if the entry's 425 SPT bit is True), or was not received from the next hop target 426 towards G (if the entry's SPT bit is False) then the packet is 427 dropped and no further actions are taken. If no (S,G) entry was 428 found, the router then checks for a matching (*,G) entry. 430 If neither is found, then the packet is forwarded towards the next- 431 hop peer for the nominal root of G, according to the Multicast RIB. 432 If a matching entry was found, the packet is forwarded to all other 433 targets in the target list. 435 Forwarding to a target which is an M-IGP component means that the 436 packet is forwarded out any interfaces owned by that component 437 according to that component's multicast forwarding rules. 439 6.3. BGMP processing of Join and Prune messages and notifications 441 6.3.1. Receiving Joins 443 When the BGMP component receives a (*,G) or (S,G) Join alert from 444 another component, or a BGMP (S,G) or (*,G) Join message from an 445 external peer, it searches the tree state table for a matching entry. 446 If an entry is found, and that peer is already listed in the target 447 list, then no further actions are taken. 449 Otherwise, if no (*,G) or (S,G) entry was found, one is created. In 450 the case of a (*,G), the target list is initialized to contain the 451 next-hop peer towards the nominal root of G, if it is an external 452 peer. If the peer is internal, the target list is initialized to 453 contain the M-IGP component owning the next-hop interface. If there 455 Draft BGMP May 2003 457 is no next-hop peer (because the nominal root of G is inside the 458 domain), then the target list is initialized to contain the next-hop 459 component. If an (S,G) entry exists for the same G for which the 460 (*,G) Join is being processed, and the next-hop peers toward S and 461 the nominal root of G are different, the BGMP router must first send 462 a (S,G) Prune message toward the source and clear the SPT bit on the 463 (S,G) entry, before activating the (*,G) entry. 465 When creating (S,G) state, if the source is internal to the BGMP 466 speaker's domain, a "Poison-Reverse" bit (PR-bit) is set. This bit 467 indicates that the router may receive packets matching (S,G) anyway 468 due to the BGMP speaker being a member of a domain on the path 469 between S and the root domain. (Depending on the M-IGP protocol, it 470 may in fact receive such packets anyway only if it is the best exit 471 for the nominal root of G.) 473 The target from which the Join was received is then added to the 474 target list. The router then looks up S or the nominal root of G in 475 the Multicast RIB to find the next-hop EGP peer. If the target list, 476 not including the next-hop target towards G for a (*,G) entry, 477 becomes non-null as a result, the next-hop EGP peer must be notified 478 as follows: 480 a) If the next-hop peer towards the nominal root of G (for a (*,G) 481 entry) is an external peer, a BGMP (*,G) Join message is unicast 482 to the external peer. If the next-hop peer towards S (for an 483 (S,G) entry) is an external peer, and the router does NOT have any 484 active (*,G) state for that group address G, a BGMP (S,G) Join 485 message is unicast to the external peer. A BGMP (S,G) Join 486 message is never sent to an external peer by a router that also 487 contains active (*,G) state for the same group. If the next-hop 488 peer towards S (for an (S,G entry) is an external peer and the 489 router DOES have active (*,G) state for that group G, the SPT bit 490 is always set to False. 492 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Join 493 alert is sent to the M-IGP component owning the next-hop 494 interface. 496 c) If there is no next-hop peer, a (*,G) or (S,G) Join alert is sent 497 to the M-IGP component owning the next-hop interface. 499 Finally, if an (S,G) Join is received from an internal peer, the 500 peer should be stored with the M-IGP component target. If (S,G) 502 Draft BGMP May 2003 504 state exists with the PR-bit set, and the next-hop towards the 505 nominal root for G is through the M-IGP component, an (S,G) 506 Poison-Reverse message is immediately sent to the internal peer. 508 If an (S,G) Join is received from an external peer, and (S,G) 509 state exists with the PR-bit set, and the local BGMP speaker is 510 the best exit for the nominal root of G, and the next-hop towards 511 the nominal root for G is through the interface towards the 512 external peer, an (S,G) Poison-Reverse message is immediately sent 513 to the external peer. 515 6.3.2. Receiving Prune Notifications 517 When the BGMP component receives a (*,G) or (S,G) Prune alert from 518 another component, or a BGMP (*,G) or (S,G) Prune message from an 519 external peer, it searches the tree state table for a matching entry. 520 If no (S,G) entry was found for an (S,G) Prune, but (*,G) state 521 exists, an (S,G) entry is created, with the target list copied from 522 the (*,G) entry. If no matching entry exists, or if the component or 523 peer is not listed in the target list, no further actions are taken. 524 Otherwise, the component or peer is removed from the target list. If 525 the target list becomes null as a result, the next-hop peer towards 526 the nominal root of G (for a (*,G) entry), or towards S (for an (S,G) 527 entry if and only if the BGMP router does NOT have any corresponding 528 (*,G) entry), must be notified as follows. 530 a) If the peer is an external peer, a BGMP (*,G) or (S,G) Prune 531 message is unicast to it. 533 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Prune 534 alert is sent to the M-IGP component owning the next-hop 535 interface. 537 c) If there is no next-hop peer, a (*,G) or (S,G) Prune alert is sent 538 to the M-IGP component owning the next-hop interface. 540 6.3.3. Receiving Route Change Notifications 542 When a border router receives a route for a new prefix in the 543 multicast RIB, or a existing route for a prefix is withdrawn, a route 544 change notification for that prefix must be sent to the BGMP 546 Draft BGMP May 2003 548 component. In addition, when the next hop peer (according to the 549 multicast RIB) changes, a route change notification for that prefix 550 must be sent to the BGMP component. 552 In addition, in IPv4 (only), an internal route for each class-D 553 prefix associated with the domain (if any) MUST be injected into the 554 multicast RIB in the EGP by the domain's border routers. 556 When a route for a new group prefix is learned, or an existing route 557 for a group prefix is withdrawn, or the next-hop peer for a group 558 prefix changes, a BGMP router updates all affected (*,G) target 559 lists. The router sends a (*,G) Join to the new next-hop target, and 560 a (*,G) Prune to the old next-hop target, as appropriate. In 561 addition, if any (S,G) state exists with the PR-bit set: 563 o If the BGMP speaker has just become the best exit for the nominal 564 root of G, an (S,G) Poison Reverse message with the PR-bit set is 565 sent as noted below. 567 o If the BGMP speaker was the best exit for the nominal root of G 568 and is no longer, an (S,G) Poison Reverse message with the PR-bit 569 clear is sent as noted below. 570 The (S,G) Poison-Reverse messages are sent to all external peers on 571 the next-hop interface towards the nominal root of G from which (S,G) 572 Joins have been received. 574 When an existing route for a source prefix is withdrawn, or the next- 575 hop peer for a source prefix changes, a BGMP router updates all 576 affected (S,G) target lists. The router sends a (S,G) Join to the 577 new next-hop target, and a (S,G) Prune to the old next-hop target, as 578 appropriate. 580 6.3.4. Receiving (S,G) Poison-Reverse messages 582 When a BGMP speaker receives an (S,G) Poison-Reverse message from a 583 peer, it sets the PR-bit on the (S,G) state to match the PR-bit in 584 the message, and looks up the next-hop towards the nominal root of G. 585 If the next-hop target is an M-IGP component, it forwards the (S,G) 586 Poison Reverse message to all internal peers of that component from 587 which it has received (S,G) Joins. If the next-hop target is an 588 external peer on a given interface, it forwards the (S,G) Poison 589 Reverse message to all external peers on that interface. 591 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 593 Draft BGMP May 2003 595 external peer, with the PR-bit set, and the speaker has received no 596 (S,G) Joins from any other peers (e.g., only from the M-IGP, or has 597 (S,G) state due to encapsulation as described in 5.4.1), it knows 598 that its own (S,G) Join is unnecessary, and should send an (S,G) 599 Prune. 601 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 602 internal peer, with the PR-bit set, and the speaker is the best exit 603 for the nominal root of G, and has (S,G) prune state, an (S,G) Join 604 message is sent to cancel the prune state and the state is deleted. 606 6.4. Interaction with M-IGP components 608 When an M-IGP component on a border router first learns that there 609 are internally-reached members for a group G (whose scope is larger 610 than that domain), a (*,G) Join alert is sent to the BGMP component. 611 Similarly, when an M-IGP component on a border router learns that 612 there are no longer internally-reached members for a group G (whose 613 scope is larger than a single domain), a (*,G) Prune alert is sent to 614 the BGMP component. 616 At any time, any M-IGP domain MAY decide to join a source-specific 617 branch for some external source S and group G. When the M-IGP 618 component in the border router that is the next-hop router for a 619 particular source S learns that a receiver wishes to receive data 620 from S on a source-specific path, an (S,G) Join alert is sent to the 621 BGMP component. When it is learned that such receivers no longer 622 exist, an (S,G) Prune alert is sent to the BGMP component. Recall 623 that the BGMP component will generate external source-specific Joins 624 only where the source-specific branch does not coincide with the 625 shared tree distribution tree for that group. 627 Finally, we will require that the border router that is the next-hop 628 internal peer for a particular address S or the nominal root of G be 629 able to forward data for a matching tree state table entry to all 630 members within the domain. This requirement has implications on 631 specific M-IGPs as follows. 633 6.4.1. Interaction with DVMRP and PIM-DM 635 DVMRP and PIM-DM are both "broadcast and prune" protocols in which 636 every data packet must pass an RPF check against the packet's source 637 address, or be dropped. If the border router receiving packets from 639 Draft BGMP May 2003 641 an external source is the only BR to inject the route for the source 642 into the domain, then there are no problems. For example, this will 643 always be true for stub domains with a single border router (see 644 Figure 1). Otherwise, the border router receiving packets externally 645 is responsible for encapsulating the data to any other border routers 646 that must inject the data into the domain for RPF checks to succeed. 648 When an intended border router injector for a source receives 649 encapsulated packets from another border router in its domain, it 650 should create source-specific (S,G) BGMP state. Note that the border 651 router may be configured to do this on a data-rate triggered basis so 652 that the state is not created for very low data-rate/intermittent 653 sources. If source-specific state is created, then its incoming 654 interface points to the virtual encapsulation interface from the 655 border router that forwarded the packet, and it has an SPT flag that 656 is initialized to be False. 658 When the (S,G) BGMP state is created, the BGMP component will in turn 659 send a BGMP (S,G) Join message to the next-hop external peer towards 660 S if there is no (*,G) state for that same group, G. The (S,G) BGMP 661 state will have the SPT bit set to False if (*,G) BGMP state is 662 present. 664 When the first data packet from S arrives from the external peer and 665 matches on the BGMP (S,G) state, and IF there is no (*,G) state, the 666 router sets the SPT flag to True, resets the incoming interface to 667 point to the external peer, and sends a BGMP (S,G) Prune message to 668 the border router that was encapsulating the packets (e.g., in Figure 669 1, BR11 sends the (Src_A,G) Prune to BR12). When the border router 670 with (*,G) state receives the prune for (S,G), it then deletes that 671 border router from its list of targets. 673 If the decapsulator receives a (S,G) Poison Reverse message with the 674 PR-bit set, it will forward it to the encapsulator (which may again 675 forward it up the shared tree according to normal BGMP rules), and 676 both will delete their BGMP (S,G) state. 678 PIM-DM and DVMRP present an additional problem, i.e., no protocol 679 mechanism exists for joining and pruning entire groups; only joins 680 and prunes for individual sources are available. If any such domain 681 desires to be able to serve as a transit domain, we require that some 682 form of Domain-Wide Reports (DWRs) [DWR] are available within such 683 domains. Such messages provide the ability to join and prune an 684 entire group across the domain. One simple heuristic to approximate 685 DWRs is to assume that if there are any internally-reached members, 687 Draft BGMP May 2003 689 then at least one of them is a sender. With this heuristic, the 690 presense of any M-IGP (S,G) state for internally-reached sources can 691 be used instead. Sending a data packet to a group is then equivalent 692 to sending a DWR for the group. 694 6.4.2. Interaction with PIM-SM 696 Protocols such as PIM-SM build unidirectional shared and source- 697 specific trees. As with DVMRP and PIM-DM, every data packet must 698 pass an RPF check against some group-specific or source-specific 699 address. 701 The fewest encapsulations/decapsulations will be done when the intra- 702 domain tree is rooted at the next-hop internal peer (which becomes 703 the RP) towards the nominal root of G, since in general that router 704 will receive the most packets from external sources. To achieve 705 this, each BGMP border router to a PIM-SM domain should send 706 Candidate-RP-Advertisements within the domain for those groups for 707 which it is the shared-domain tree ingress router. When the border 708 router that is the RP for a group G receives an external data packet, 709 it forwards the packet according to the M-IGP (i.e., PIM-SM) shared- 710 tree outgoing interface list. 712 Other border routers will receive data packets from external sources 713 that are farther down the bidirectional tree of domains. When a 714 border router that is not the RP receives an external packet for 715 which it does not have a source-specific entry, the border router 716 treats it like a local source by creating (S,G) state with a Register 717 flag set, based on normal PIM-SM rules; the Border router then 718 encapsulates the data packets in PIM-SM Registers and unicasts them 719 to the RP for the group. As explained above, the RP for the inter- 720 domain group will be one of the other border routers of the domain. 722 If a source's data rate is high enough, DRs within the PIM-SM domain 723 may switch to the shortest path tree. If the shortest path to an 724 external source is via the group's ingress router for the shared 725 tree, the new (S,G) state in the BGMP border router will not cause 726 BGMP (S,G) Joins because that border router will already have (*,G) 727 state. If however, the shortest path to an external source is via 728 some other border router, that border router will create (S,G) BGMP 729 state in response to the M-IGP (S,G) Join alert. In this case, 730 because there is no local (*,G) state to supress it, the border 731 router will send a BGMP (S,G) Join to the next-hop external peer 733 Draft BGMP May 2003 735 towards S, in order to pull the data down directly. (See BR11 in 736 Figure 1.) As in normal PIM-SM operation, those PIM-SM routers that 737 have (*,G) and (S,G) state pointing to different incoming interfaces 738 will prune that source off the shared tree. Therefore, all internal 739 interfaces may be eventually pruned off the internal shared tree. 741 After the border router sends a BGMP (S,G) Join, if its (S,G) state 742 has the PR-bit clear, a (S,G) Poison-Reverse message (with the PR-bit 743 clear) is sent to the ingress router for G. The ingress router then 744 creates (S,G) if it does not already exist, and removes the next hop 745 towards the nominal root of G from the target list. 747 If the border router later receives an (S,G) Poison-Reverse message 748 with the PR-bit set, the Poison-Reverse message is forwarded to the 749 ingress router for G. The best-exit router then creates (S,G) state 750 if it does not already exist, and puts the next hop towards the 751 nominal root of G in the target list if not already present. 753 6.4.3. Interaction with CBT 755 CBT builds bidirectional shared trees but must address two points of 756 compatibility with BGMP. First, CBT can not accommodate more than 757 one border router injecting a packet. Therefore, if a CBT domain 758 does have multiple external connections, the M-IGP components of the 759 border routers are responsible for insuring that only one of them 760 will inject data from any given source. 762 Second, CBT cannot process source-specific Joins or Prunes. Two 763 options thus exist for each CBT domain: 765 Option A: 766 The CBT component interprets a (S,G) Join alert as if it were an 767 (*,G) Join alert, as described in [INTEROP]. That is, if it is not 768 already on the core-tree for G, then it sends a CBT (*,G) JOIN- 769 REQUEST message towards the core for G. Similarly, when the CBT 770 component receives an (S,G) Prune alert, and the child interface 771 list for a group is NULL, then it sends a (*,G) QUIT_NOTIFICATION 772 towards the core for G. This option has the disadvantage of 773 pulling all data for the group G down to the CBT domain when no 774 members exist. 776 Option B: 777 The CBT domain does not propagate any source routes (i.e., non- 778 class D routes) to their external peers for the Multicast RIB 780 Draft BGMP May 2003 782 unless it is known that no other path exists to that prefix (e.g., 783 routes for prefixes internal to the domain or in a singly-homed 784 customer's domain may be propagated). This insures that source- 785 specific joins are never received unless the source's data already 786 passes through the domain on the shared tree, in which case the 787 (S,G) Join need not be propagated anyway. BGMP border routers will 788 only send source-specific Joins or Prunes to an external peer if 789 that external peer advertises source-prefixes in the EGP. If a 790 BGMP-CBT border router does receive an (S,G) Join or Prune, that 791 border router should ignore the message. 793 To minimize en/de-capsulations, CBTv2 BR's may follow the same 794 scheme as described under PIM-SM above, in which Candidate-Core 795 advertisements are sent for those groups for which it is the 796 shared-tree ingress router. 798 6.4.4. Interaction with MOSPF 800 As with CBTv2, MOSPF cannot process source-specific Joins or Prunes, 801 and the same two options are available. Therefore, an MOSPF domain 802 may either: 804 Option A: 805 send a Group-Membership-LSA for all of G in response to a (S,G) 806 Join alert, and "prematurely age" it out (when no other downstream 807 members exist) in response to an (S,G) Prune alert, OR 809 Option B: 810 not propagate any source routes (i.e., non-class D routes) to their 811 external peers for the Multicast RIB unless it is known that no 812 other path exists to that prefix (e.g., routes for prefixes 813 internal to the domain or in a singly-homed customer's domain may 814 be propagated) 816 6.5. Operation over Multi-access Networks 818 Multiaccess links require special handling to prevent duplicates. 819 The following mechanism enables BGMP to operate over multiaccess 820 links which do not run an M-IGP. This avoids broadcast-and-prune 821 behavior and does not require (S,G) state. 823 To elect a designated forwarder per prefix, BGMP uses a FWDR_PREF 824 message to exchange "forwarder preference" values for each prefix. 826 Draft BGMP May 2003 828 The peer with the highest forwarder preference becomes the designated 829 forwarder, with ties broken by lowest BGMP Identifier. The 830 designated forwarder is the router responsible for forwarding packets 831 up the tree, and is the peer to which joins will be sent. 833 When BGMP first learns that a route exists in the multicast RIB whose 834 next-hop interface is NOT the multiaccess link, the BGMP router sends 835 a BGMP FWDR_PREF message for the prefix, to all BGMP peers on the 836 LAN. The FWDR_PREF message contains a "forwarder preference value" 837 for the local router, and the same value MUST be sent to all peers on 838 the LAN. Likewise, when the prefix is no longer reachable, a 839 FWDR_PREF of 0 is sent to all peers on the LAN. 841 Whenever a BGMP router calculates the next-hop peer towards a 842 particular address, and that peer is reached over a BGMP-owned 843 multiaccess LAN, the designated forwarder is used instead. 845 When a BGMP router receives a FWDR_PREF message from a peer, it looks 846 up the matching route in its multicast RIB, and calculates the new 847 designated forwarder. If the router has tree state entries whose 848 parent target was the old forwarder, it sends Joins to the new 849 forwarder and Prunes to the old forwarder. 851 When a BGMP router which is NOT the designated forwarder receives a 852 packet on the multiaccess link, it is silently dropped. 854 Finally, this mechanism prevents duplicates where full peering exists 855 on a "logical" link. Where full peering does not exist, steps must 856 be taken (outside of BGMP) to present separate logical interfaces to 857 BGMP, each of which is a link with full peering. This might entail, 858 for example, using different link-layer address mappings, doing 859 encapsulation, or changing the physical media. 861 6.6. Interaction between (S,G) state and G-routes 863 As discussed earlier, routers with (*,G) state will not propagate (S,G) 864 joins. However, a special case occurs when (S,G) state coincides with 865 the G-route (or route towards the nominal root of G). When this occurs, 866 care must be taken so that the data will reach the root domain without 867 causing duplicates or black holes. For this reason, (S,G) state on the 868 path between the source and the root domain is annotated as being 869 "poison-reversed". A PR-bit is kept for this purpose, which is updated 870 by (UN)POISON_REVERSE messages. 872 The PR-bit indicates to BGMP nodes whether 873 they need to forward packets up towards the root domain. 874 For example, in a case where an (S,G) branch exists, 875 a transit domain may get packets along the (S,G) branch, 876 and needs to know whether to (also) forward them up 877 towards the root domain. If the domain in question 878 is on the path between S and the root domain, then 879 the answer is yes (and the PR bit will be set on the 880 S,G state). If the domain in question is not on the 881 path between S and the root domain, then the answer 882 is no (and the PR bit will be clear on the S,G state). 884 Draft BGMP May 2003 886 7. Message Formats 888 This section describes message formats used by BGMP. 890 Messages are sent over a reliable transport protocol connection. A 891 message is processed only after it is entirely received. The maximum 892 message size is 4096 octets. All implementations are required to 893 support this maximum message size. 895 All fields labelled "Reserved" below must be transmitted as 0, and 896 ignored upon receipt. 898 7.1. Message Header Format 900 Each message has a fixed-size (4-byte) header. There may or may not 901 be a data portion following the header, depending on the message 902 type. The layout of these fields is shown below: 904 0 1 2 3 905 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 906 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 907 | Length | Type | Reserved | 908 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 910 Length: 911 This 2-octet unsigned integer indicates the total length of the 912 message, including the header, in octets. Thus, e.g., it allows 913 one to locate in the transport-level stream the start of the next 914 message. The value of the Length field must always be at least 4 915 and no greater than 4096, and may be further constrained, depending 916 on the message type. No "padding" of extra data after the message 917 is allowed, so the Length field must have the smallest value 918 required given the rest of the message. 920 Type: 921 This 1-octet unsigned integer indicates the type code of the 922 message. The following type codes are defined: 924 1 - OPEN 925 2 - UPDATE 926 3 - NOTIFICATION 927 4 - KEEPALIVE 929 Draft BGMP May 2003 931 7.2. OPEN Message Format 933 After a transport protocol connection is established, the first 934 message sent by each side is an OPEN message. If the OPEN message is 935 acceptable, a KEEPALIVE message confirming the OPEN is sent back. 936 Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION 937 messages may be exchanged. 939 In addition to the fixed-size BGMP header, the OPEN message contains 940 the following fields: 942 0 1 2 3 943 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 944 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 945 | Version | Rsvd| AddrFam | Hold Time | 946 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 947 | BGMP Identifier (variable length) | 948 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 949 | | 950 + (Optional Parameters) | 951 | | 952 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 954 Version: 955 This 1-octet unsigned integer indicates the protocol version number 956 of the message. The current BGMP version number is 1. 958 AddrFam: 959 The IANA-assigned address family number of the BGMP Identifier. 960 These include (among others): 962 Number Description 963 ------ ----------- 964 1 IP (IP version 4) 965 2 IPv6 (IP version 6) 967 Hold Time: 968 This 2-octet unsigned integer indicates the number of seconds that 969 the sender proposes for the value of the Hold Timer. Upon receipt 970 of an OPEN message, a BGMP speaker MUST calculate the value of the 971 Hold Timer by using the smaller of its configured Hold Time and the 972 Hold Time received in the OPEN message. The Hold Time MUST be 974 Draft BGMP May 2003 976 either zero or at least three seconds. An implementation may 977 reject connections on the basis of the Hold Time. The calculated 978 value indicates the maximum number of seconds that may elapse 979 between the receipt of successive KEEPALIVE, and/or UPDATE messages 980 by the sender. 982 BGMP Identifier: 983 This 4-octet (for IPv4) or 16-octet (IPv6) unsigned integer 984 indicates the BGMP Identifier of the sender. A given BGMP speaker 985 sets the value of its BGMP Identifier to a globally-unique value 986 assigned to that BGMP speaker (e.g., an IPv4 address). The value 987 of the BGMP Identifier is determined on startup and is the same for 988 every BGMP session opened. 990 Optional Parameters: 991 This field may contain a list of optional parameters, where each 992 parameter is encoded as a triplet. The combined length of all optional 994 parameters can be derived from the Length field in the message 995 header. 997 0 1 998 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 999 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 1000 | Parm. Type | Parm. Length | Parameter Value (variable) 1001 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 1003 Parameter Type is a one octet field that unambiguously identifies 1004 individual parameters. Parameter Length is a one octet field that 1005 contains the length of the Parameter Value field in octets. 1006 Parameter Value is a variable length field that is interpreted 1007 according to the value of the Parameter Type field. 1009 This document defines the following Optional Parameters: 1011 a) Authentication Information (Parameter Type 1): 1012 This optional parameter may be used to authenticate a BGMP peer. 1013 The Parameter Value field contains a 1-octet Authentication Code 1014 followed by a variable length Authentication Data. 1016 0 1 2 3 4 5 6 7 8 1018 Draft BGMP May 2003 1020 +-+-+-+-+-+-+-+-+ 1021 | Auth. Code | 1022 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1023 | | 1024 | Authentication Data | 1025 | | 1026 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1028 Authentication Code: 1030 This 1-octet unsigned integer indicates the authentication 1031 mechanism being used. Whenever an authentication mechanism is 1032 specified for use within BGMP, three things must be included in 1033 the specification: 1035 - the value of the Authentication Code which indicates use of the 1036 mechanism, and - the form and meaning of the Authentication Data. 1038 Note that a separate authentication mechanism may be used in 1039 establishing the transport level connection. 1041 Authentication Data: 1043 The form and meaning of this field is a variable-length field 1044 depend on the Authentication Code. 1046 The minimum length of the OPEN message is 12 octets (including 1047 message header). 1049 b) Capability Information (Parameter Type 2): 1050 This is an Optional Parameter that is used by a BGMP-speaker to 1051 convey to its peer the list of capabilities supported by the 1052 speaker. The parameter contains one or more triples , where each triple is 1054 encoded as shown below: 1055 +------------------------------+ 1056 | Capability Code (1 octet) | 1057 +------------------------------+ 1058 | Capability Length (1 octet) | 1059 +------------------------------+ 1060 | Capability Value (variable) | 1061 +------------------------------+ 1062 Capability Code: 1064 Draft BGMP May 2003 1066 Capability Code is a one octet field that unambiguously identifies 1067 individual capabilities. 1069 Capability Length: 1071 Capability Length is a one octet field that contains the length of 1072 the Capability Value field in octets. 1074 Capability Value: 1076 Capability Value is a variable length field that is interpreted 1077 according to the value of the Capability Code field. 1079 A particular capability, as identified by its Capability Code, may 1080 occur more than once within the Optional Parameter. 1082 This document reserves Capability Codes 128-255 for vendor-specific 1083 applications. 1085 This document reserves value 0. 1087 Capability Codes (other than those reserved for vendor specific use) 1088 are assigned only by the IETF consensus process and IESG approval. 1090 7.3. UPDATE Message Format 1092 UPDATE messages are used to transfer Join/Prune/FwdrPref information 1093 between BGMP peers. The UPDATE message always includes the fixed- 1094 size BGMP header, and one or more attributes as described below. 1096 The message format below allows compact encoding of (*,G) Joins and 1097 Prunes, while allowing the flexibility needed to do other updates 1098 such as (S,G) Joins and Prunes towards sources as well as on the 1099 shared tree. In the discussion below, an Encoded-Address-Prefix is 1100 of the form: 1101 0 1 2 3 1102 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1103 +-+-+-+-+-+-+-+-+ 1104 |EnTyp| AddrFam | 1105 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1106 | Address (variable length) | 1107 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1108 | Mask (variable length) | 1110 Draft BGMP May 2003 1112 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1114 EnTyp: 1115 0 - All 1's Mask. The Mask field is 0 bytes long. 1116 1 - Mask length included. The Mask field is 4 bytes long, and 1117 contains the mask length, in bits. 1118 2 - Full Mask included. The Mask field is the same length 1119 as the Address field, and contains the full bitmask. 1121 AddrFam: 1122 The IANA-assigned address family number of the encoded prefix. 1124 Address: 1125 The address associated with the given prefix to be encoded. The 1126 length is determined based on the Address Family. 1128 Mask: 1129 The mask associated with the given prefix. The format (or absence) 1130 of this field is determined by the EnTyp field. 1132 Each attribute is of the form: 1134 0 1 2 3 1135 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1136 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1137 | Length | Type | Data ... 1138 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1139 All attributes are 4-byte aligned. 1141 Length: 1142 The Length is the length of the entire attribute, including the 1143 length, type, and data fields. If other attributes are nested 1144 within the data field, the length includes the size of all such 1145 nested attributes. 1147 Type: 1149 Types 128-255 are reserved for "optional" attributes. If a 1150 required attribute is unrecognized, a NOTIFICATION will be sent and 1151 the connection will be closed if the error is a fatal one. 1152 Unrecognized optional attributes are simply ignored. 1154 0 - JOIN 1156 Draft BGMP May 2003 1158 1 - PRUNE 1159 2 - GROUP 1160 3 - SOURCE 1161 4 - FWDR_PREF 1162 5 - POISON_REVERSE 1164 a) JOIN (Type Code 0) 1166 The JOIN attribute indicates that all GROUP or SOURCE options 1167 nested immediately within the JOIN option should be joined. 1169 0 1 2 3 1170 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1171 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1172 | Length | Type=0 | Reserved | 1173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1174 | Nested Attributes ... 1175 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1176 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1177 within a JOIN attribute. 1179 b) PRUNE (Type Code 1) 1181 The PRUNE attribute indicates that all GROUP or SOURCE attributes 1182 nested immediately within the PRUNE attribute should be pruned. 1184 0 1 2 3 1185 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1186 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1187 | Length | Type=1 | Reserved | 1188 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1189 | Nested Attributes ... 1190 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1191 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1192 within a PRUNE attribute. 1194 c) GROUP (Type Code 2) 1196 The GROUP attribute identifies a given group-prefix. In addition, 1197 any attributes nested immediately within the GROUP attribute also 1198 apply to the given group-prefix. 1200 0 1 2 3 1201 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1202 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1204 Draft BGMP May 2003 1206 | Length | Type=2 | | 1207 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1208 | | 1209 | Encoded-Address-Prefix | 1210 | | 1211 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1212 | Nested Attributes (optional) ... 1213 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1214 Encoded-Address-Prefix 1215 The multicast group prefix to be joined to 1216 pruned, 1217 in the format described above. 1218 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1219 be 1220 immediately nested within a GROUP attribute. 1222 d) SOURCE (Type Code 3): 1224 The SOURCE attribute identifies a given source-prefix. In 1225 addition, any attributes nested immediately within the SOURCE 1226 attribute also apply to the given source-prefix. 1228 The SOURCE attribute has the following format: 1230 0 1 2 3 1231 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1232 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1233 | Length | Type=2 | | 1234 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1235 | | 1236 | Encoded-Address-Prefix | 1237 | | 1238 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1239 | Nested Attributes (optional) ... 1240 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1241 Encoded-Address-Prefix 1242 The Source-prefix in the format described 1243 above. 1244 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1245 be 1246 immediately nested within a SOURCE attribute. 1248 e) FWDR_PREF (Type Code 4) 1250 The FWDR_PREF attribute provides a forwarder preference value for 1252 Draft BGMP May 2003 1254 all GROUP or SOURCE attributes nested immediately within the 1255 FWDR_PREF attribute. It is used by a BGMP speaker to inform other 1256 BGMP speakers of the originating speaker's degree of preference for 1257 a given group or source prefix. Usage of this attribute is 1258 described in 5.5. 1260 0 1 2 3 1261 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1262 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1263 | Length | Type=1 | Reserved | 1264 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1265 | Preference Value | 1266 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1267 | Nested Attributes ... 1268 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1269 Preference Value A 32-bit non-negative integer. 1270 Nested Attributes No JOIN, PRUNE, or FWDR_PREF attributes may be 1271 immediately nested within a FWDR_PREF 1272 attribute. 1274 e) POISON_REVERSE (Type Code 5) 1276 The POISON_REVERSE attribute provides a "poison-reverse" (PR-bit) 1277 value for all SOURCE attributes nested immediately within the 1278 POISON_REVERSE attribute. It is used by a BGMP speaker to inform 1279 other BGMP speakers from which it has received (S,G) Joins that 1280 they are on the path of domains between the source and the root 1281 domain. 1283 0 1 2 3 1284 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1286 | Length | Type=1 | Reserved |P| 1287 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1288 | Nested Attributes ... 1289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1290 P The PR-bit value. 1291 Nested Attributes No attribues in the document other than SOURCE 1292 may be immediately nested within a 1293 POISON_REVERSE 1294 attribute. 1296 Draft BGMP May 2003 1298 7.4. Encoding examples 1300 Below are enumerated examples of how various updates are built using 1301 nested attributes, where A ( B ) denotes that attribute B is nested 1302 within attribute A. 1303 (*,G-prefix) Join: JOIN ( GROUP ) 1304 (*,G-prefix) Prune: PRUNE ( GROUP ) 1305 (S,G) Join towards S : GROUP ( JOIN ( SOURCE ) ) 1306 (S,G) Join cancelling prune towards root of G: GROUP ( JOIN ( SOURCE ) ) 1307 (S,G) Prune towards S: GROUP ( PRUNE ( SOURCE ) ) 1308 (S,G) Prune towards root of G: GROUP ( PRUNE ( SOURCE ) ) 1309 Switch from (*,G) to (S,G): PRUNE ( GROUP ( JOIN ( SOURCE ) ) ) 1310 Switch from (S,G) to (*,G): JOIN ( GROUP ) 1311 Initial (*,G) Join with S pruned: JOIN ( GROUP ( PRUNE ( SOURCE ) ) ) 1312 Forwarder preference announcement for G-prefix: FWDR_PREF ( GROUP ) 1313 Forwarder preference announcement for S-prefix: FWDR_PREF ( SOURCE ) 1315 7.5. KEEPALIVE Message Format 1317 BGMP does not use any transport protocol-based keep-alive mechanism 1318 to determine if peers are reachable. Instead, KEEPALIVE messages are 1319 exchanged between peers often enough as not to cause the Hold Timer 1320 to expire. A reasonable maximum time between the last KEEPALIVE or 1321 UPDATE message sent, and the time at which a KEEPALIVE message is 1322 sent, would be one third of the Hold Time interval. KEEPALIVE 1323 messages MUST NOT be sent more frequently than one per second. An 1324 implementation MAY adjust the rate at which it sends KEEPALIVE 1325 messages as a function of the Hold Time interval. 1327 If the negotiated Hold Time interval is zero, then periodic KEEPALIVE 1328 messages MUST NOT be sent. 1330 A KEEPALIVE message consists of only a message header, and has a 1331 length of 4 octets. 1333 7.6. NOTIFICATION Message Format 1335 A NOTIFICATION message is sent when an error condition is detected. 1336 The BGMP connection is closed immediately after sending it if the 1337 error is a fatal one. 1339 In addition to the fixed-size BGMP header, the NOTIFICATION message 1340 contains the following fields: 1342 Draft BGMP May 2003 1344 0 1 2 3 1345 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1347 |O| Error code | Error subcode | Data | 1348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1349 | | 1350 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1352 O-bit: 1353 Open-bit. If clear, the connection will be closed. 1354 If set, indicates the error is not fatal. 1356 Error Code: 1358 This 1-octet unsigned integer indicates the type of 1359 NOTIFICATION. The following Error Codes have been defined: 1361 Error Code Symbolic Name Reference 1363 1 Message Header Error Section 9.1 1365 2 OPEN Message Error Section 9.2 1367 3 UPDATE Message Error Section 9.3 1369 4 Hold Timer Expired Section 9.5 1371 5 Finite State Machine Error Section 9.6 1373 6 Cease Section 9.7 1375 Error subcode: 1377 This 1-octet unsigned integer provides more specific 1378 information about the nature of the reported error. Each 1379 Error 1380 Code may have one or more Error Subcodes associated with it. 1381 If no appropriate Error Subcode is defined, then a zero 1382 (Unspecific) value is used for the Error Subcode field. 1383 The notation (MC) below indicates the error is a fatal one 1384 and the O-bit must be clear. Non-fatal subcodes SHOULD 1385 be sent with the O-bit set. 1387 Message Header Error subcodes: 1389 Draft BGMP May 2003 1391 2 - Bad Message Length (MC) 1392 3 - Bad Message Type (MC) 1394 OPEN Message Error subcodes: 1396 1 - Unsupported Version (MC) 1397 4 - Unsupported Optional Parameter 1398 5 - Authentication Failure (MC) 1399 6 - Unacceptable Hold Time (MC) 1400 7 - Unsupported Capability (MC) 1402 UPDATE Message Error subcodes: 1404 1 - Malformed Attribute List (MC) 1405 2 - Unrecognized Attribute Type 1406 5 - Attribute Length Error (MC) 1407 10 - Invalid Address 1408 11 - Invalid Mask 1409 13 - Unrecognized Address Family 1410 Data: 1411 This variable-length field is used to diagnose the reason for the 1412 NOTIFICATION. The contents of the Data field depend upon the 1413 Error Code and Error Subcode. See Section 9 below for more 1414 details. 1416 Note that the length of the Data field can be determined from the 1417 message Length field by the formula: 1419 Message Length = 6 + Data Length 1421 The minimum length of the NOTIFICATION message is 6 octets 1422 (including message header). 1424 Draft BGMP May 2003 1426 8. BGMP Error Handling 1428 This section describes actions to be taken when errors are detected 1429 while processing BGMP messages. BGMP Error Handling is similar to 1430 that of BGP [BGP]. 1432 When any of the conditions described here are detected, a 1433 NOTIFICATION message with the indicated Error Code, Error Subcode, 1434 and Data fields is sent, and the BGMP connection is closed if the 1435 error is a fatal one. If no Error Subcode is specified, then a zero 1436 must be used. 1438 The phrase "the BGMP connection is closed" means that the transport 1439 protocol connection has been closed and that all resources for that 1440 BGMP connection have been deallocated. The remote peer is removed 1441 from the target list of all tree state entries. 1443 Unless specified explicitly, the Data field of the NOTIFICATION 1444 message that is sent to indicate an error is empty. 1446 8.1. Message Header error handling 1448 All errors detected while processing the Message Header are indicated 1449 by sending the NOTIFICATION message with Error Code Message Header 1450 Error. The Error Subcode elaborates on the specific nature of the 1451 error. 1453 If the Length field of the message header is less than 4 or greater 1454 than 4096, or if the Length field of an OPEN message is less than 1455 the minimum length of the OPEN message, or if the Length field of an 1456 UPDATE message is less than the minimum length of the UPDATE message, 1457 or if the Length field of a KEEPALIVE message is not equal to 4, then 1458 the Error Subcode is set to Bad Message Length. The Data field 1459 contains the erroneous Length field. 1461 If the Type field of the message header is not recognized, then the 1462 Error Subcode is set to Bad Message Type. The Data field contains 1463 the erroneous Type field. 1465 8.2. OPEN message error handling 1467 All errors detected while processing the OPEN message are indicated 1468 by sending the NOTIFICATION message with Error Code OPEN Message 1470 Draft BGMP May 2003 1472 Error. The Error Subcode elaborates on the specific nature of the 1473 error. 1475 If the version number contained in the Version field of the received 1476 OPEN message is not supported, then the Error Subcode is set to 1477 Unsupported Version Number. The Data field is a 2-octet unsigned 1478 integer, which indicates the largest locally supported version number 1479 less than the version the remote BGMP peer bid (as indicated in the 1480 received OPEN message). 1482 If the Hold Time field of the OPEN message is unacceptable, then the 1483 Error Subcode MUST be set to Unacceptable Hold Time. An 1484 implementation MUST reject Hold Time values of one or two seconds. 1485 An implementation MAY reject any proposed Hold Time. An 1486 implementation which accepts a Hold Time MUST use the negotiated 1487 value for the Hold Time. 1489 If one of the Optional Parameters in the OPEN message is not 1490 recognized, then the Error Subcode is set to Unsupported Optional 1491 Parameters. 1493 If the OPEN message carries Authentication Information (as an 1494 Optional Parameter), then the corresponding authentication procedure 1495 is invoked. If the authentication procedure (based on Authentication 1496 Code and Authentication Data) fails, then the Error Subcode is set to 1497 Authentication Failure. 1499 If the OPEN message indicates that the peer does not support a 1500 capability which the receiver requires, the receiver may send a 1501 NOTIFICATION message to the peer, and terminate peering. The Error 1502 Subcode in the message is set to Unsupported Capability. The Data 1503 field in the NOTIFICATION message lists the set of capabilities that 1504 cause the speaker to send the message. Each such capability is 1505 encoded the same way as it was encoded in the received OPEN message. 1507 8.3. UPDATE message error handling 1509 All errors detected while processing the UPDATE message are indicated 1510 by sending the NOTIFICATION message with Error Code UPDATE Message 1511 Error. The error subcode elaborates on the specific nature of the 1512 error. 1514 Draft BGMP May 2003 1516 If any recognized attribute has Attribute Length that conflicts with 1517 the expected length (based on the attribute type code), then the 1518 Error Subcode is set to Attribute Length Error. The Data field 1519 contains the erroneous attribute (type, length and value). 1521 If the Encoded-Address-Prefix field in some attribute is 1522 syntactically incorrect, then the Error Subcode is set to Invalid 1523 Prefix Field. 1525 If any other is encountered when processing attributes (such as 1526 invalid nestings), then the Error Subcode is set to Malformed 1527 Attribute List, and the problematic attribute is included in the data 1528 field. 1530 8.4. NOTIFICATION message error handling 1532 If a peer sends a NOTIFICATION message, and there is an error in that 1533 message, there is unfortunately no means of reporting this error via 1534 a subsequent NOTIFICATION message. Any such error, such as an 1535 unrecognized Error Code or Error Subcode, should be noticed, logged 1536 locally, and brought to the attention of the administration of the 1537 peer. The means to do this, however, lies outside the scope of this 1538 document. 1540 8.5. Hold Timer Expired error handling 1542 If a system does not receive successive KEEPALIVE and/or UPDATE 1543 and/or NOTIFICATION messages within the period specified in the Hold 1544 Time field of the OPEN message, then the NOTIFICATION message with 1545 Hold Timer Expired Error Code must be sent and the BGMP connection 1546 closed. 1548 8.6. Finite State Machine error handling 1550 Any error detected by the BGMP Finite State Machine (e.g., receipt of 1551 an unexpected event) is indicated by sending the NOTIFICATION message 1552 with Error Code Finite State Machine Error. 1554 Draft BGMP May 2003 1556 8.7. Cease 1558 In absence of any fatal errors (that are indicated in this section), 1559 a BGMP peer may choose at any given time to close its BGMP connection 1560 by sending the NOTIFICATION message with Error Code Cease. However, 1561 the Cease NOTIFICATION message must not be used when a fatal error 1562 indicated by this section does exist. 1564 8.8. Connection collision detection 1566 If a pair of BGMP speakers try simultaneously to establish a TCP 1567 connection to each other, then two parallel connections between this 1568 pair of speakers might well be formed. We refer to this situation as 1569 connection collision. Clearly, one of these connections must be 1570 closed. 1572 Based on the value of the BGMP Identifier a convention is established 1573 for detecting which BGMP connection is to be preserved when a 1574 collision does occur. The convention is to compare the BGMP 1575 Identifiers of the peers involved in the collision and to retain only 1576 the connection initiated by the BGMP speaker with the higher-valued 1577 BGMP Identifier. 1579 Upon receipt of an OPEN message, the local system must examine all of 1580 its connections that are in the OpenConfirm state. A BGMP speaker 1581 may also examine connections in an OpenSent state if it knows the 1582 BGMP Identifier of the peer by means outside of the protocol. If 1583 among these connections there is a connection to a remote BGMP 1584 speaker whose BGMP Identifier equals the one in the OPEN message, 1585 then the local system performs the following collision resolution 1586 procedure: 1588 1. The BGMP Identifier of the local system is compared to the BGMP 1589 Identifier of the remote system (as specified in the OPEN message). 1591 2. If the value of the local BGMP Identifier is less than the remote 1592 one, the local system closes BGMP connection that already exists (the 1593 one that is already in the OpenConfirm state), and accepts BGMP 1594 connection initiated by the remote system. 1596 3. Otherwise, the local system closes newly created BGMP connection 1597 (the one associated with the newly received OPEN message), and 1598 continues to use the existing one (the one that is already in the 1599 OpenConfirm state). 1601 Draft BGMP May 2003 1603 Comparing BGMP Identifiers is done by treating them as (4-octet long) 1604 unsigned integers. 1606 A connection collision with an existing BGMP connection that is in 1607 Established states causes unconditional closing of the newly created 1608 connection. Note that a connection collision cannot be detected with 1609 connections that are in Idle, or Connect, or Active states. 1611 Closing the BGMP connection (that results from the collision 1612 resolution procedure) is accomplished by sending the NOTIFICATION 1613 message with the Error Code Cease. 1615 9. BGMP Version Negotiation 1617 BGMP speakers may negotiate the version of the protocol by making 1618 multiple attempts to open a BGMP connection, starting with the 1619 highest version number each supports. If an open attempt fails with 1620 an Error Code OPEN Message Error, and an Error Subcode Unsupported 1621 Version Number, then the BGMP speaker has available the version 1622 number it tried, the version number its peer tried, the version 1623 number passed by its peer in the NOTIFICATION message, and the 1624 version numbers that it supports. If the two peers do support one or 1625 more common versions, then this will allow them to rapidly determine 1626 the highest common version. In order to support BGMP version 1627 negotiation, future versions of BGMP must retain the format of the 1628 OPEN and NOTIFICATION messages. 1630 9.1. BGMP Capability Negotiation 1632 When a BGMP speaker sends an OPEN message to its BGMP peer, the 1633 message may include an Optional Parameter, called Capabilities. The 1634 parameter lists the capabilities supported by the speaker. 1636 A BGMP speaker may use a particular capability when peering with 1637 another speaker only if both speakers support that capability. A 1638 BGMP speaker determines the capabilities supported by its peer by 1639 examining the list of capabilities present in the Capabilities 1640 Optional Parameter carried by the OPEN message that the speaker 1641 receives from the peer. 1643 Draft BGMP May 2003 1645 10. BGMP Finite State machine 1647 This section specifies BGMP operation in terms of a Finite State 1648 Machine (FSM). Following is a brief summary and overview of BGMP 1649 operations by state as determined by this FSM. 1651 Initially BGMP is in the Idle state. 1653 Idle state: 1655 In this state BGMP refuses all incoming BGMP connections. No 1656 resources are allocated to the peer. In response to the Start 1657 event (initiated by either system or operator) the local system 1658 initializes all BGMP resources, starts the ConnectRetry timer, 1659 initiates a transport connection to the other BGMP peer, while 1660 listening for a connection that may be initiated by the remote 1661 BGMP peer, and changes its state to Connect. The exact value of 1662 the ConnectRetry timer is a local matter, but should be 1663 sufficiently large to allow TCP initialization. 1665 If a BGMP speaker detects an error, it shuts down the connection 1666 and changes its state to Idle. Getting out of the Idle state 1667 requires generation of the Start event. If such an event is 1668 generated automatically, then persistent BGMP errors may result in 1669 persistent flapping of the speaker. To avoid such a condition it 1670 is recommended that Start events should not be generated 1671 immediately for a peer that was previously transitioned to Idle 1672 due to an error. For a peer that was previously transitioned to 1673 Idle due to an error, the time between consecutive generation of 1674 Start events, if such events are generated automatically, shall 1675 exponentially increase. The value of the initial timer shall be 60 1676 seconds. The time shall be doubled for each consecutive retry. 1678 Any other event received in the Idle state is ignored. 1680 Connect state: 1682 In this state BGMP is waiting for the transport protocol 1683 connection to be completed. 1685 If the transport protocol connection succeeds, the local system 1686 clears the ConnectRetry timer, completes initialization, sends an 1687 OPEN message to its peer, and changes its state to OpenSent. If 1688 the transport protocol connect fails (e.g., retransmission 1689 timeout), the local system restarts the ConnectRetry timer, 1691 Draft BGMP May 2003 1693 continues to listen for a connection that may be initiated by the 1694 remote BGMP peer, and changes its state to Active state. 1696 In response to the ConnectRetry timer expired event, the local 1697 system restarts the ConnectRetry timer, initiates a transport 1698 connection to the other BGMP peer, continues to listen for a 1699 connection that may be initiated by the remote BGMP peer, and 1700 stays in the Connect state. 1702 The Start event is ignored in the Connect state. 1704 In response to any other event (initiated by either system or 1705 operator), the local system releases all BGMP resources associated 1706 with this connection and changes its state to Idle. 1708 Active state: 1710 In this state BGMP is trying to acquire a peer by listening for an 1711 incoming transport protocol connection. 1713 If the transport protocol connection succeeds, the local system 1714 clears the ConnectRetry timer, completes initialization, sends an 1715 OPEN message to its peer, sets its Hold Timer to a large value, 1716 and changes its state to OpenSent. A Hold Timer value of 4 1717 minutes is suggested. 1719 In response to the ConnectRetry timer expired event, the local 1720 system restarts the ConnectRetry timer, initiates a transport 1721 connection to other BGMP peer, continues to listen for a 1722 connection that may be initiated by the remote BGMP peer, and 1723 changes its state to Connect. 1725 If the local system detects that a remote peer is trying to 1726 establish BGMP connection to it, and the IP address of the remote 1727 peer is not an expected one, the local system restarts the 1728 ConnectRetry timer, rejects the attempted connection, continues to 1729 listen for a connection that may be initiated by the remote BGMP 1730 peer, and stays in the Active state. 1732 The Start event is ignored in the Active state. 1734 In response to any other event (initiated by either system or 1735 operator), the local system releases all BGMP resources associated 1736 with this connection and changes its state to Idle. 1738 Draft BGMP May 2003 1740 OpenSent state: 1742 In this state BGMP waits for an OPEN message from its peer. When 1743 an OPEN message is received, all fields are checked for 1744 correctness. If the BGMP message header checking or OPEN message 1745 checking detects an error (see Section 6.2), or a connection 1746 collision (see Section 6.8) the local system sends a NOTIFICATION 1747 message and changes its state to Idle. 1749 If there are no errors in the OPEN message, BGMP sends a KEEPALIVE 1750 message and sets a KeepAlive timer. The Hold Timer, which was 1751 originally set to a large value (see above), is replaced with the 1752 negotiated Hold Time value (see section 4.2). If the negotiated 1753 Hold Time value is zero, then the Hold Time timer and KeepAlive 1754 timers are not started. If the value of the Autonomous System 1755 field is the same as the local Autonomous System number, then the 1756 connection is an "internal" connection; otherwise, it is 1757 "external". Finally, the state is changed to OpenConfirm. 1759 If a disconnect notification is received from the underlying 1760 transport protocol, the local system closes the BGMP connection, 1761 restarts the ConnectRetry timer, while continue listening for 1762 connection that may be initiated by the remote BGMP peer, and goes 1763 into the Active state. 1765 If the Hold Timer expires, the local system sends NOTIFICATION 1766 message with error code Hold Timer Expired and changes its state 1767 to Idle. 1769 In response to the Stop event (initiated by either system or 1770 operator) the local system sends NOTIFICATION message with Error 1771 Code Cease and changes its state to Idle. 1773 The Start event is ignored in the OpenSent state. 1775 In response to any other event the local system sends NOTIFICATION 1776 message with Error Code Finite State Machine Error and changes its 1777 state to Idle. 1779 Whenever BGMP changes its state from OpenSent to Idle, it closes 1780 the BGMP (and transport-level) connection and releases all 1781 resources associated with that connection. 1783 OpenConfirm state: 1785 Draft BGMP May 2003 1787 In this state BGMP waits for a KEEPALIVE or NOTIFICATION message. 1789 If the local system receives a KEEPALIVE message, it changes its 1790 state to Established. 1792 If the Hold Timer expires before a KEEPALIVE message is received, 1793 the local system sends NOTIFICATION message with error code Hold 1794 Timer Expired and changes its state to Idle. 1796 If the local system receives a NOTIFICATION message, it changes 1797 its state to Idle. 1799 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1800 message and restarts its KeepAlive timer. 1802 If a disconnect notification is received from the underlying 1803 transport protocol, the local system changes its state to Idle. 1805 In response to the Stop event (initiated by either system or 1806 operator) the local system sends NOTIFICATION message with Error 1807 Code Cease and changes its state to Idle. 1809 The Start event is ignored in the OpenConfirm state. 1811 In response to any other event the local system sends NOTIFICATION 1812 message with Error Code Finite State Machine Error and changes its 1813 state to Idle. 1815 Whenever BGMP changes its state from OpenConfirm to Idle, it 1816 closes the BGMP (and transport-level) connection and releases all 1817 resources associated with that connection. 1819 Established state: 1821 In the Established state BGMP can exchange UPDATE, NOTIFICATION, 1822 and KEEPALIVE messages with its peer. 1824 If the local system receives an UPDATE or KEEPALIVE message, it 1825 restarts its Hold Timer, if the negotiated Hold Time value is non- 1826 zero. 1828 If the local system receives a NOTIFICATION message, it changes 1829 its state to Idle. 1831 If the local system receives an UPDATE message and the UPDATE 1833 Draft BGMP May 2003 1835 message error handling procedure (see Section 6.3) detects an 1836 error, the local system sends a NOTIFICATION message and changes 1837 its state to Idle. 1839 If a disconnect notification is received from the underlying 1840 transport protocol, the local system changes its state to Idle. 1842 If the Hold Timer expires, the local system sends a NOTIFICATION 1843 message with Error Code Hold Timer Expired and changes its state 1844 to Idle. 1846 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1847 message and restarts its KeepAlive timer. 1849 Each time the local system sends a KEEPALIVE or UPDATE message, it 1850 restarts its KeepAlive timer, unless the negotiated Hold Time 1851 value is zero. 1853 In response to the Stop event (initiated by either system or 1854 operator), the local system sends a NOTIFICATION message with 1855 Error Code Cease and changes its state to Idle. 1857 The Start event is ignored in the Established state. 1859 In response to any other event, the local system sends 1860 NOTIFICATION message with Error Code Finite State Machine Error 1861 and changes its state to Idle. 1863 Whenever BGMP changes its state from Established to Idle, it 1864 closes the BGMP (and transport-level) connection, releases all 1865 resources associated with that connection, and deletes all routes 1866 derived from that connection. 1868 11. Security Considerations 1870 BGMP uses TCP sessions for all network communication between peers. TCP 1871 sessions may be secured through the use of IPsec [IPSEC]. 1873 12. Authors' Addresses 1875 Dave Thaler 1876 Microsoft 1878 Draft BGMP May 2003 1880 One Microsoft Way 1881 Redmond, WA 98052 1882 EMail: dthaler@microsoft.com 1884 13. Normative References 1886 [INTEROP] 1887 Thaler, D., "Interoperability Rules for Multicast Routing 1888 Protocols", RFC 2715, October 1999. 1890 [IPSEC] 1891 Kent, S., and R. Atkinson, "Security Architecture for the Internet 1892 Protocol", RFC 2401, November 1998. 1894 [RFC2119] 1895 S. Bradner, "Key words for use in RFCs to Indicate Requirement 1896 Levels", BCP 14, RFC 2119, March 1997. 1898 [V4PREFIX] 1899 D. Thaler, "Unicast-Prefix-based IPv4 Multicast Addresses", draft- 1900 ietf-mboned-ipv4-uni-based-mcast-00.txt, Work in progress, June 1901 2002. 1903 [V6PREFIX] 1904 Haberman, B., and D. Thaler, "Unicast-Prefix-based IPv6 Multicast 1905 Addresses", RFC 3306, August 2002. 1907 14. Non-normative References 1909 [BGP] 1910 Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1911 1771, March 1995. 1913 [MBGP] 1914 Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol 1915 Extensions for BGP-4", RFC 2858, June 2000. 1917 [CBT] 1918 Ballardie, A., "Core Based Trees (CBT version 2) Multicast 1919 Routing", RFC 2189, September 1997. 1921 Draft BGMP May 2003 1923 [DVMRP] 1924 Pusateri, T., "Distance Vector Multicast Routing Protocol", draft- 1925 ietf-idmr-dvmrp-v3-10.txt, Work in progress, August 2000. 1927 [DWR] 1928 Fenner, W., "Domain-Wide Reports", draft-ietf-idmr-membership- 1929 reports-04.txt, Work in progress, August 1999. 1931 [IPv6AA] 1932 Hinden, R., and S. Deering, "IP Version 6 Addressing Architecture", 1933 RFC 3513, April 2003. 1935 [MOSPF] 1936 Moy, J., "Multicast Extensions to OSPF", RFC 1584, Proteon, March 1937 1994. 1939 [PIMDM] 1940 Adams, A., Nicholas, J., and W. Siadak, "Protocol Independent 1941 Multicast - Dense Mode (PIM-DM): Protocol Specification (Revised)", 1942 draft-ietf-pim-dm-new-v2-01.txt, Work in progress, February 2002. 1944 [PIMSM] 1945 Fenner, et al., "Protocol Independent Multicast-Sparse Mode (PIM- 1946 SM): Protocol Specification (Revised)", draft-ietf-pim-sm-v2-new-07.txt, 1947 March 2003. 1949 [REFLECT] 1950 Bates, T., and R. Chandra, "BGP Route Reflection: An alternative to 1951 full mesh IBGP", RFC 1966, June 1996. 1953 15. Full Copyright Statement 1955 Copyright (C) The Internet Society (2003). All Rights Reserved. 1957 This document and translations of it may be copied and furnished to 1958 others, and derivative works that comment on or otherwise explain it or 1959 assist in its implementation may be prepared, copied, published and 1960 distributed, in whole or in part, without restriction of any kind, 1961 provided that the above copyright notice and this paragraph are included 1962 on all such copies and derivative works. However, this document itself 1963 may not be modified in any way, such as by removing the copyright notice 1964 or references to the Internet Society or other Internet organizations, 1965 except as needed for the purpose of developing Internet standards in 1966 which case the procedures for copyrights defined in the Internet 1967 Draft BGMP May 2003 1969 languages other than English. 1971 The limited permissions granted above are perpetual and will not be 1972 revoked by the Internet Society or its successors or assigns. 1974 This document and the information contained herein is provided on an "AS 1975 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 1976 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 1977 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 1978 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 1979 FITNESS FOR A PARTICULAR PURPOSE. 1981 Table of Contents 1983 xp t