idnits 2.17.1 draft-ietf-bgmp-spec-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 3 longer pages, the longest (page 19) being 69 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 3 instances of too long lines in the document, the longest one being 5 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 465 has weird spacing: '... target list ...' == Line 1461 has weird spacing: '...is less than...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 2003) is 7468 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BR32' is mentioned on line 255, but not defined == Missing Reference: 'BR41' is mentioned on line 255, but not defined == Missing Reference: 'BR31' is mentioned on line 257, but not defined == Missing Reference: 'BR42' is mentioned on line 257, but not defined == Missing Reference: 'BR43' is mentioned on line 257, but not defined == Missing Reference: 'BR22' is mentioned on line 259, but not defined == Missing Reference: 'BR52' is mentioned on line 259, but not defined == Missing Reference: 'BR53' is mentioned on line 259, but not defined == Missing Reference: 'BR21' is mentioned on line 261, but not defined == Missing Reference: 'BR51' is mentioned on line 261, but not defined == Missing Reference: 'BR12' is mentioned on line 263, but not defined == Missing Reference: 'BR61' is mentioned on line 263, but not defined == Missing Reference: 'BR13' is mentioned on line 265, but not defined == Missing Reference: 'BR71' is mentioned on line 269, but not defined == Missing Reference: 'BR81' is mentioned on line 269, but not defined == Missing Reference: 'BRXY' is mentioned on line 273, but not defined == Missing Reference: 'PIM-SM' is mentioned on line 296, but not defined == Unused Reference: 'DVMRP' is defined on line 1930, but no explicit reference was found in the text == Unused Reference: 'MOSPF' is defined on line 1942, but no explicit reference was found in the text == Unused Reference: 'PIMDM' is defined on line 1946, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 2715 (ref. 'INTEROP') ** Obsolete normative reference: RFC 2401 (ref. 'IPSEC') (Obsoleted by RFC 4301) == Outdated reference: A later version (-06) exists of draft-ietf-mboned-ipv4-uni-based-mcast-00 -- Obsolete informational reference (is this intentional?): RFC 1771 (ref. 'BGP') (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 2858 (ref. 'MBGP') (Obsoleted by RFC 4760) == Outdated reference: A later version (-11) exists of draft-ietf-idmr-dvmrp-v3-10 == Outdated reference: A later version (-05) exists of draft-ietf-idmr-membership-reports-04 -- Obsolete informational reference (is this intentional?): RFC 3513 (ref. 'IPv6AA') (Obsoleted by RFC 4291) == Outdated reference: A later version (-05) exists of draft-ietf-pim-dm-new-v2-01 == Outdated reference: A later version (-12) exists of draft-ietf-pim-sm-v2-new-07 -- Obsolete informational reference (is this intentional?): RFC 1966 (ref. 'REFLECT') (Obsoleted by RFC 4456) Summary: 5 errors (**), 0 flaws (~~), 31 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BGMP Working Group D. Thaler 3 Internet Engineering Task Force Microsoft 4 INTERNET-DRAFT June 2003 5 Expires November 2003 7 Border Gateway Multicast Protocol (BGMP): 8 Protocol Specification 9 11 Status of this Memo 13 This document is an Internet-Draft and is in full conformance with all 14 provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering Task 17 Force (IETF), its areas, and its working groups. Note that other groups 18 may also distribute working documents as Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference material 23 or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/1id-abstracts.html 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html 31 Abstract 33 This document describes BGMP, a protocol for inter-domain multicast 34 routing. BGMP builds shared trees for active multicast groups, and 35 optionally allows receiver domains to build source-specific, inter- 36 domain, distribution branches where needed. BGMP natively supports 37 Draft BGMP June 2003 39 "source-specific multicast" (SSM). To also support "any-source 40 multicast" (ASM), BGMP requires that each multicast group be associated 41 with a single root (in BGMP it is referred to as the root domain). It 42 requires that different ranges of the class D space are associated 43 (e.g., with Unicast-Prefix-Based Multicast addressing) with different 44 domains. Each of these domains then becomes the root of the shared 45 domain-trees for all groups in its range. Multicast participants will 46 generally receive better multicast service if the session initiator's 47 address allocator selects addresses from its own domain's part of the 48 space, thereby causing the root domain to be local to at least one of 49 the session participants. 51 NOTE: 52 This specification is published for the general information of the 53 Internet technical community and as an archival record of the work 54 done. The operational community generally agrees that this protocol 55 is not deployable in its current form; it is being published in the 56 hopes that it may provide a useful starting point for future work. 58 Copyright Notice 60 Copyright (C) The Internet Society (2003). All Rights Reserved. 62 1. Acknowledgements 64 In addition to the editors, the following individuals have 65 contributed to the design of BGMP: Cengiz Alaettinoglu, Tony 66 Ballardie, Steve Casner, Steve Deering, Deborah Estrin, Dino 67 Farinacci, Bill Fenner, Mark Handley, Ahmed Helmy, Van Jacobson, Dave 68 Meyer, and Satish Kumar. 70 This document is the product of the IETF BGMP Working Group with Dave 71 Thaler as editor. 73 Rusty Eddy, Isidor Kouvelas, and Pavlin Radoslavov also provided 74 valuable feedback on this document. 76 2. Purpose 78 It has been suggested that inter-domain "any-source" multicast is 79 better supported with a rendezvous mechanism whereby members receive 80 source's data packets without any sort of global broadcast (e.g., 81 MSDP broadcasts source information, PIM-DM and DVMRP broadcast 82 initial data packets, and MOSPF broadcasts membership information). 83 PIM-SM [PIMSM] and CBT [CBT] use a shared group-tree, to which all 84 members join and thereby hear from all sources (and to which non- 85 members do not join and thereby hear from no sources). 87 This document describes BGMP, a protocol for inter-domain multicast 88 routing. BGMP natively supports "source-specific multicast" (SSM). 90 Draft BGMP June 2003 92 To also support "any-source multicast" (ASM), BGMP builds shared 93 trees for active multicast groups, and allows domains to build 94 source-specific, inter-domain, distribution branches where needed. 95 Building upon concepts from PIM-SM and CBT, BGMP requires that each 96 global multicast group be associated with a single root. However, in 97 BGMP, the root is an entire exchange or domain, rather than a single 98 router. 100 For non-source-specific groups, BGMP assumes that ranges of the multicast 101 address space have been associated (e.g., with Unicast-Prefix-Based 102 Multicast [V4PREFIX,V6PREFIX] addressing) with selected domains. 103 Each such domain then becomes the root of the shared domain-trees for 104 all groups in its range. An address allocator will generally achieve 105 better distribution trees if it takes its multicast addresses from 106 its own domain's part of the space, thereby causing the root domain 107 to be local. 109 BGMP uses TCP as its transport protocol. This eliminates the need to 110 implement message fragmentation, retransmission, acknowledgement, and 111 sequencing. BGMP uses TCP port 264 for establishing its connections. 112 This port is distinct from BGP's port to provide protocol 113 independence, and to facilitate distinguishing between protocol 114 packets (e.g., by packet classifiers, diagnostic utilities, etc.) 116 Two BGMP peers form a TCP connection between one another, and 117 exchange messages to open and confirm the connection parameters. 118 They then send incremental Join/Prune Updates as group memberships 119 change. BGMP does not require periodic refresh of individual 120 entries. KeepAlive messages are sent periodically to ensure the 121 liveness of the connection. Notification messages are sent in 122 response to errors or special conditions. If a connection encounters 123 an error condition, a notification message is sent and the connection 124 is closed if the error is a fatal one. 126 3. Revision History 128 29 June 2003 draft-01 130 (1) Removed all references to MASC and G-RIB. The current spec only 131 covers BGMP operation for source-specific groups, and any-source- 132 multicast using unicast prefix-based multicast addresses (for 133 both IPv4 and IPv6). No new routes of any type are needed in the 134 routing table. 136 Draft BGMP June 2003 138 (2) Removed section on transitioning away from using DVMRP as the 139 backbone to an AS-based multicast routing system with MBGP, as 140 this has already happened. 142 4. Terminology 144 This document uses the following technical terms: 146 Domain: 147 A set of one or more contiguous links and zero or more routers 148 surrounded by one or more multicast border routers. Note that this 149 loose definition of domain also applies to an external link between 150 two domains, as well as an exchange. 152 Root Domain: 153 When constructing a shared tree of domains for some group, one 154 domain will be the "root" of the tree. The root domain receives 155 data from each sender to the group, and functions as a rendezvous 156 domain toward which member domains can send inter-domain joins, and 157 to which sender domains can send data. 159 Multicast RIB: 160 The Routing Information Base, or routing table, used to calculate 161 the "next-hop" towards a particular address for multicast traffic. 163 Multicast IGP (M-IGP): 164 A generic term for any multicast routing protocol used for tree 165 construction within a domain. Typical examples of M-IGPs are: PIM- 166 SM, PIM-DM, DVMRP, MOSPF, and CBT. 168 EGP: A generic term for the interdomain unicast routing protocol in use. 169 Typically, this will be some version of BGP which can support a 170 Multicast RIB, such as MBGP [MBGP], containing both unicast and 171 multicast address prefixes. 173 Component: 174 The portion of a border router associated with (and logically 175 inside) a particular domain that runs the multicast IGP (M-IGP) for 176 that domain, if any. Each border router thus has zero or more 177 components inside routing domains. In addition, each border router 178 with external links that do not fall inside any routing domain will 179 have an inter-domain component that runs BGMP. 181 Draft BGMP June 2003 183 External peer: 184 A border router in another multicast AS (autonomous system, as used 185 in BGP), to which a BGMP TCP-connection is open. If BGP is being 186 used as the EGP, a separate "eBGP" TCP-connection will also be open 187 to the same peer. 189 Internal peer: 190 Another border router of the same multicast AS. If BGP is being 191 used as the EGP, the border router either speaks iBGP ("internal" 192 BGP) directly to internal peers in a full mesh, or indirectly 193 through a route reflector [REFLECT]. 195 Next-hop peer: 196 The next-hop peer towards a given IP address is the next EGP router 197 on the path to the given address, according to multicast RIB routes 198 in the EGP's routing table (e.g., in MBGP, routes whose Subsequent 199 Address Family Identifier field indicates that the route is valid 200 for multicast traffic). 202 target: 203 Either an EGP peer, or an M-IGP component. 205 Tree State Table: 206 This is a table of (S-prefix,G) and (*,G-prefix) entries that have 207 been explicitly joined by a set of targets. Each entry has, in 208 addition to the source and group addresses and masks, a list of 209 targets that have explicitly requested data (on behalf of directly 210 connected hosts or downstream routers). (S,G) entries also have an 211 "SPT" bit. 213 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in 214 this document are to be interpreted as described in [RFC2119]. 216 5. Protocol Overview 218 BGMP maintains group-prefix state in response to messages from BGMP 219 peers and notifications from M-IGP components. Group-shared trees 220 are rooted at the domain advertising the group prefix covering those 221 groups. When a receiver joins a specific group address, the border 222 router towards the root domain generates a group-specific Join 223 message, which is then forwarded Border-Router-by-Border-Router 224 towards the root domain (see Figure 1). BGMP Join and Prune messages 225 are sent over TCP connections between BGMP peers, and BGMP protocol 226 state is refreshed by KEEPALIVE messages periodically sent over TCP. 228 Draft BGMP June 2003 230 BGMP routers build group-specific bidirectional forwarding state as 231 they process the BGMP Join messages. Bidirectional forwarding state 232 means that packets received from any target are forwarded to all 233 other targets in the target list without any RPF checks. No group- 234 specific state or traffic exists in parts of the network where there 235 are no members of that group. 237 BGMP routers optionally build source-specific unidirectional 238 forwarding state, only where needed, to be compatible with source- 239 specific trees (SPTs) used by some M-IGPs (e.g., DVMRP, PIM-DM, or 240 PIM-SM), or to construct trees for source-specific groups. A domain 241 that uses an SPT-based M-IGP may need to inject multicast packets 242 from external sources via different border routers (to be compatible 243 with the M-IGP RPF checks) which thus act as "surrogates". For 244 example, in the Transit_1 domain, data from Src_A arrives at BR12, 245 but must be injected by BR11. A surrogate router may create a 246 source-specific BGMP branch if no shared tree state exists. Note: 247 stub domains with a single border router, such as Rcvr_Stub_7 in 248 Figure 1, receive all multicast data packets through that router, to 249 which all RPF checks point. Therefore, stub domains never build 250 source-specific state. 252 Root_Domain 253 [BR91]--------------------------\ 254 | | 255 [BR32] [BR41] 256 Transit_3 Transit_4 257 [BR31] [BR42] [BR43] 258 | | | 259 [BR22] [BR52] [BR53] 260 Transit_2 Transit_5 261 [BR21] [BR51] 262 | | 263 [BR12] [BR61] 264 Transit_1[BR11]----------[BR62]Stub_6 265 [BR13] (Src_A) 266 | (Rcvr_D) 267 ------------------- 268 | | 269 [BR71] [BR81] 270 Rcvr_Stub_7 Src_only_Stub_8 271 (Rcvr_C) (Src_B) 273 Figure 1: Example inter-domain topology. [BRXY] represents a BGMP 274 border 276 Draft BGMP June 2003 278 router. Transit_X is a transit domain network. *_Stub_X is a stub 279 domain network. 281 Data packets are forwarded based on a combination of BGMP and M-IGP 282 rules. The router forwards to a set of targets according to a 283 matching (S,G) BGMP tree state entry if it exists. If not found, the 284 router checks for a matching (*,G) BGMP tree state entry. If neither 285 is found, then the packet is sent natively to the next-hop EGP peer 286 for G, according to the Multicast RIB (for example, in the case of a 287 non-member sender such as Src_B in Figure 1). If a matching entry 288 was found, the packet is forwarded to all other targets in the target 289 list. In this way BGMP trees forward data in a bidirectional manner. 290 If a target is an M-IGP component then forwarding is subject to the 291 rules of that M-IGP protocol. 293 5.1. Design Rationale 295 Several other protocols, or protocol proposals, build shared trees 296 within domains [PIM-SM, CBT]. The design choices made for BGMP 297 result from our focus on Inter-Domain multicast in particular. The 298 design choices made by PIM-SM and CBT are better suited to the wide- 299 area intra-domain case. There are three major differences between 300 BGMP and other shared-tree protocols: 302 (1) Unidirectional vs. Bidirectional trees 304 Bidirectional trees (using bidirectional forwarding state as 305 described above) minimize third party dependence which is essential 306 in the inter-domain context. For example, in Figure 1, stub domains 307 7 and 8 would like to exchange multicast packets without being 308 dependent on the quality of connectivity of the root domain. 309 However, unidirectional shared trees (i.e., those using RPF checks) 310 have more aggressive loop prevention and share the same processing 311 rules as source-specific entries which are inherently unidirectional. 313 The lack of third party dependence concerns in the INTRA domain case 314 reduces the incentive to employ bidirectional trees. BGMP supports 315 bidirectional trees because it has to, and because it can without 316 excessive cost. 318 (2) Source-specific distribution trees/branches 320 In a departure from other shared tree protocols, source-specific BGMP 322 Draft BGMP June 2003 324 state is built ONLY where (a) it is needed to pull the multicast 325 traffic down to a BGMP router that has source-specific (S,G) state, 326 and (b) that router is NOT already on the shared tree (i.e., has no 327 (*,G) state), and (c) that router does not want to receive packets 328 via encapsulation from a router which is on the shared tree. BGMP 329 provides source-specific branches because most M-IGP protocols in use 330 today build source-specific trees. BGMP's source-specific branches 331 eliminate the unnecessary overhead of encapsulations for high data 332 rate sources from the shared tree's ingress router to the surrogate 333 injector (e.g. from BR12 to BR11 in Figure 1). Moreover, cases in 334 which shared paths are significantly longer than SPT paths will also 335 benefit. 337 However, except for source-specific group distribution trees, we do 338 not build source-specific inter-domain trees in general because (a) 339 inter-domain connectivity is generally less rich than intra-domain 340 connectivity, so shared distribution trees should have more 341 acceptible path length and traffic concentration properties in the 342 inter-domain context, than in the intra-domain case, and (b) by 343 having the shared tree state always take precedence over source- 344 specific tree state, we avoid ambiguities that can otherwise arise. 346 In summary, BGMP trees are, in a sense, a hybrid between PIM-SM and 347 CBT trees. 349 (3) Method of choosing root of group shared tree 351 The choice of a group's shared-tree-root has implications for 352 performance and policy. In the intra-domain case it is sometimes 353 assumed that all potential shared-tree roots (RPs/Cores) within the 354 domain are equally suited to be the root for a group that is 355 initiated within that domain. In the INTER-domain case, there is far 356 more opportunity for unacceptably poor locality, and administrative 357 control of a group's shared-tree root. Therefore in the intra-domain 358 case, other protocols sometimes treat all candidate roots (RPs or 359 Cores) as equivalent and emphasize load sharing and stability to 360 maximize performance. In the Inter-Domain case, all roots are not 361 equivalent, and we adopt an approach whereby a group's root domain is 362 not random but is subject to administrative control. 364 6. Protocol Details 366 In this section, we describe the detailed protocol that border 367 routers perform. We assume that each border router conforms to the 369 Draft BGMP June 2003 371 component-based model described in [INTEROP], modulo one correction 372 to section 3.2 ("BGMP" Dispatcher), as follows: 374 The iif owner of a (*,G) entry is the component owning the next-hop 375 interface towards the nominal root of G, in the multicast RIB. 377 6.1. Interaction with the EGP 379 The fundamental requirements imposed by BGMP are that: 381 (1) For a given source-specific group and source, BGMP must be able 382 to look up the next-hop towards the source in the Multicast RIB, 383 and 385 (2) For a given non-source-specific group, BGMP will map the group 386 address to a nominal "root" address, and must be able to look up 387 the next-hop towards that address in the Multicast RIB. 389 BGMP determines the nominal "root" address as follows. If the multicast 390 address is a Unicast-Prefix-based Multicast address (for either IPv4 or 391 IPv6), then the nominal root address is the embedded unicast prefix, 392 padded with a suffix of 0 bits to form a full address. 394 For example, if the IPv6 group address is 395 ff2e:0100:1234:5678:9abc:def0::123, then the unicast prefix is 396 1234:5678:9abc:def0/64, and the nominal root address would be 397 1234:5678:9abc:def0::. (This address is in fact the subnet routers 398 anycast address [IPv6AA].) 400 As an IPv4 example, if the IPv4 group address were 225.1.2.3, then the 401 nominal root address would be 1.2.3.0. 403 Support for any-source-multicast using any address other than a Unicast- 404 prefix-based Multicast Address is outside the scope of this document. 406 6.2. Multicast Data Packet Processing 408 For BGMP rules to be applied, an incoming packet must first be 409 "accepted": 411 o If the packet arrived on an interface owned by an M-IGP, the M-IGP 412 component determines whether the packet should be accepted or 413 dropped according to its rules. If the packet is accepted, the 415 Draft BGMP June 2003 417 packet is forwarded (or not forwarded) out any other interfaces 418 owned by the same component, as specified by the M-IGP. 420 o If the packet was received over a point-to-point interface owned 421 by BGMP, the packet is accepted. 423 o If the packet arrived on a multiaccess network interface owned by 424 BGMP, the packet is accepted if it is receiving data on a source- 425 specific branch, if it is the designated forwarder for the longest 426 matching route for S, or for the longest matching route for the 427 nominal root of G. 429 If the packet is accepted, then the router checks the tree state 430 table for a matching (S,G) entry. If one is found, but the packet 431 was not received from the next hop target towards S (if the entry's 432 SPT bit is True), or was not received from the next hop target 433 towards G (if the entry's SPT bit is False) then the packet is 434 dropped and no further actions are taken. If no (S,G) entry was 435 found, the router then checks for a matching (*,G) entry. 437 If neither is found, then the packet is forwarded towards the next- 438 hop peer for the nominal root of G, according to the Multicast RIB. 439 If a matching entry was found, the packet is forwarded to all other 440 targets in the target list. 442 Forwarding to a target which is an M-IGP component means that the 443 packet is forwarded out any interfaces owned by that component 444 according to that component's multicast forwarding rules. 446 6.3. BGMP processing of Join and Prune messages and notifications 448 6.3.1. Receiving Joins 450 When the BGMP component receives a (*,G) or (S,G) Join alert from 451 another component, or a BGMP (S,G) or (*,G) Join message from an 452 external peer, it searches the tree state table for a matching entry. 453 If an entry is found, and that peer is already listed in the target 454 list, then no further actions are taken. 456 Otherwise, if no (*,G) or (S,G) entry was found, one is created. In 457 the case of a (*,G), the target list is initialized to contain the 458 next-hop peer towards the nominal root of G, if it is an external 459 peer. If the peer is internal, the target list is initialized to 460 contain the M-IGP component owning the next-hop interface. If there 462 Draft BGMP June 2003 464 is no next-hop peer (because the nominal root of G is inside the 465 domain), then the target list is initialized to contain the next-hop 466 component. If an (S,G) entry exists for the same G for which the 467 (*,G) Join is being processed, and the next-hop peers toward S and 468 the nominal root of G are different, the BGMP router must first send 469 a (S,G) Prune message toward the source and clear the SPT bit on the 470 (S,G) entry, before activating the (*,G) entry. 472 When creating (S,G) state, if the source is internal to the BGMP 473 speaker's domain, a "Poison-Reverse" bit (PR-bit) is set. This bit 474 indicates that the router may receive packets matching (S,G) anyway 475 due to the BGMP speaker being a member of a domain on the path 476 between S and the root domain. (Depending on the M-IGP protocol, it 477 may in fact receive such packets anyway only if it is the best exit 478 for the nominal root of G.) 480 The target from which the Join was received is then added to the 481 target list. The router then looks up S or the nominal root of G in 482 the Multicast RIB to find the next-hop EGP peer. If the target list, 483 not including the next-hop target towards G for a (*,G) entry, 484 becomes non-null as a result, the next-hop EGP peer must be notified 485 as follows: 487 a) If the next-hop peer towards the nominal root of G (for a (*,G) 488 entry) is an external peer, a BGMP (*,G) Join message is unicast 489 to the external peer. If the next-hop peer towards S (for an 490 (S,G) entry) is an external peer, and the router does NOT have any 491 active (*,G) state for that group address G, a BGMP (S,G) Join 492 message is unicast to the external peer. A BGMP (S,G) Join 493 message is never sent to an external peer by a router that also 494 contains active (*,G) state for the same group. If the next-hop 495 peer towards S (for an (S,G entry) is an external peer and the 496 router DOES have active (*,G) state for that group G, the SPT bit 497 is always set to False. 499 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Join 500 alert is sent to the M-IGP component owning the next-hop 501 interface. 503 c) If there is no next-hop peer, a (*,G) or (S,G) Join alert is sent 504 to the M-IGP component owning the next-hop interface. 506 Finally, if an (S,G) Join is received from an internal peer, the 507 peer should be stored with the M-IGP component target. If (S,G) 509 Draft BGMP June 2003 511 state exists with the PR-bit set, and the next-hop towards the 512 nominal root for G is through the M-IGP component, an (S,G) 513 Poison-Reverse message is immediately sent to the internal peer. 515 If an (S,G) Join is received from an external peer, and (S,G) 516 state exists with the PR-bit set, and the local BGMP speaker is 517 the best exit for the nominal root of G, and the next-hop towards 518 the nominal root for G is through the interface towards the 519 external peer, an (S,G) Poison-Reverse message is immediately sent 520 to the external peer. 522 6.3.2. Receiving Prune Notifications 524 When the BGMP component receives a (*,G) or (S,G) Prune alert from 525 another component, or a BGMP (*,G) or (S,G) Prune message from an 526 external peer, it searches the tree state table for a matching entry. 527 If no (S,G) entry was found for an (S,G) Prune, but (*,G) state 528 exists, an (S,G) entry is created, with the target list copied from 529 the (*,G) entry. If no matching entry exists, or if the component or 530 peer is not listed in the target list, no further actions are taken. 531 Otherwise, the component or peer is removed from the target list. If 532 the target list becomes null as a result, the next-hop peer towards 533 the nominal root of G (for a (*,G) entry), or towards S (for an (S,G) 534 entry if and only if the BGMP router does NOT have any corresponding 535 (*,G) entry), must be notified as follows. 537 a) If the peer is an external peer, a BGMP (*,G) or (S,G) Prune 538 message is unicast to it. 540 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Prune 541 alert is sent to the M-IGP component owning the next-hop 542 interface. 544 c) If there is no next-hop peer, a (*,G) or (S,G) Prune alert is sent 545 to the M-IGP component owning the next-hop interface. 547 6.3.3. Receiving Route Change Notifications 549 When a border router receives a route for a new prefix in the 550 multicast RIB, or a existing route for a prefix is withdrawn, a route 551 change notification for that prefix must be sent to the BGMP 553 Draft BGMP June 2003 555 component. In addition, when the next hop peer (according to the 556 multicast RIB) changes, a route change notification for that prefix 557 must be sent to the BGMP component. 559 In addition, in IPv4 (only), an internal route for each class-D 560 prefix associated with the domain (if any) MUST be injected into the 561 multicast RIB in the EGP by the domain's border routers. 563 When a route for a new group prefix is learned, or an existing route 564 for a group prefix is withdrawn, or the next-hop peer for a group 565 prefix changes, a BGMP router updates all affected (*,G) target 566 lists. The router sends a (*,G) Join to the new next-hop target, and 567 a (*,G) Prune to the old next-hop target, as appropriate. In 568 addition, if any (S,G) state exists with the PR-bit set: 570 o If the BGMP speaker has just become the best exit for the nominal 571 root of G, an (S,G) Poison Reverse message with the PR-bit set is 572 sent as noted below. 574 o If the BGMP speaker was the best exit for the nominal root of G 575 and is no longer, an (S,G) Poison Reverse message with the PR-bit 576 clear is sent as noted below. 577 The (S,G) Poison-Reverse messages are sent to all external peers on 578 the next-hop interface towards the nominal root of G from which (S,G) 579 Joins have been received. 581 When an existing route for a source prefix is withdrawn, or the next- 582 hop peer for a source prefix changes, a BGMP router updates all 583 affected (S,G) target lists. The router sends a (S,G) Join to the 584 new next-hop target, and a (S,G) Prune to the old next-hop target, as 585 appropriate. 587 6.3.4. Receiving (S,G) Poison-Reverse messages 589 When a BGMP speaker receives an (S,G) Poison-Reverse message from a 590 peer, it sets the PR-bit on the (S,G) state to match the PR-bit in 591 the message, and looks up the next-hop towards the nominal root of G. 592 If the next-hop target is an M-IGP component, it forwards the (S,G) 593 Poison Reverse message to all internal peers of that component from 594 which it has received (S,G) Joins. If the next-hop target is an 595 external peer on a given interface, it forwards the (S,G) Poison 596 Reverse message to all external peers on that interface. 598 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 600 Draft BGMP June 2003 602 external peer, with the PR-bit set, and the speaker has received no 603 (S,G) Joins from any other peers (e.g., only from the M-IGP, or has 604 (S,G) state due to encapsulation as described in 5.4.1), it knows 605 that its own (S,G) Join is unnecessary, and should send an (S,G) 606 Prune. 608 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 609 internal peer, with the PR-bit set, and the speaker is the best exit 610 for the nominal root of G, and has (S,G) prune state, an (S,G) Join 611 message is sent to cancel the prune state and the state is deleted. 613 6.4. Interaction with M-IGP components 615 When an M-IGP component on a border router first learns that there 616 are internally-reached members for a group G (whose scope is larger 617 than that domain), a (*,G) Join alert is sent to the BGMP component. 618 Similarly, when an M-IGP component on a border router learns that 619 there are no longer internally-reached members for a group G (whose 620 scope is larger than a single domain), a (*,G) Prune alert is sent to 621 the BGMP component. 623 At any time, any M-IGP domain MAY decide to join a source-specific 624 branch for some external source S and group G. When the M-IGP 625 component in the border router that is the next-hop router for a 626 particular source S learns that a receiver wishes to receive data 627 from S on a source-specific path, an (S,G) Join alert is sent to the 628 BGMP component. When it is learned that such receivers no longer 629 exist, an (S,G) Prune alert is sent to the BGMP component. Recall 630 that the BGMP component will generate external source-specific Joins 631 only where the source-specific branch does not coincide with the 632 shared tree distribution tree for that group. 634 Finally, we will require that the border router that is the next-hop 635 internal peer for a particular address S or the nominal root of G be 636 able to forward data for a matching tree state table entry to all 637 members within the domain. This requirement has implications on 638 specific M-IGPs as follows. 640 6.4.1. Interaction with DVMRP and PIM-DM 642 DVMRP and PIM-DM are both "broadcast and prune" protocols in which 643 every data packet must pass an RPF check against the packet's source 644 address, or be dropped. If the border router receiving packets from 646 Draft BGMP June 2003 648 an external source is the only BR to inject the route for the source 649 into the domain, then there are no problems. For example, this will 650 always be true for stub domains with a single border router (see 651 Figure 1). Otherwise, the border router receiving packets externally 652 is responsible for encapsulating the data to any other border routers 653 that must inject the data into the domain for RPF checks to succeed. 655 When an intended border router injector for a source receives 656 encapsulated packets from another border router in its domain, it 657 should create source-specific (S,G) BGMP state. Note that the border 658 router may be configured to do this on a data-rate triggered basis so 659 that the state is not created for very low data-rate/intermittent 660 sources. If source-specific state is created, then its incoming 661 interface points to the virtual encapsulation interface from the 662 border router that forwarded the packet, and it has an SPT flag that 663 is initialized to be False. 665 When the (S,G) BGMP state is created, the BGMP component will in turn 666 send a BGMP (S,G) Join message to the next-hop external peer towards 667 S if there is no (*,G) state for that same group, G. The (S,G) BGMP 668 state will have the SPT bit set to False if (*,G) BGMP state is 669 present. 671 When the first data packet from S arrives from the external peer and 672 matches on the BGMP (S,G) state, and IF there is no (*,G) state, the 673 router sets the SPT flag to True, resets the incoming interface to 674 point to the external peer, and sends a BGMP (S,G) Prune message to 675 the border router that was encapsulating the packets (e.g., in Figure 676 1, BR11 sends the (Src_A,G) Prune to BR12). When the border router 677 with (*,G) state receives the prune for (S,G), it then deletes that 678 border router from its list of targets. 680 If the decapsulator receives a (S,G) Poison Reverse message with the 681 PR-bit set, it will forward it to the encapsulator (which may again 682 forward it up the shared tree according to normal BGMP rules), and 683 both will delete their BGMP (S,G) state. 685 PIM-DM and DVMRP present an additional problem, i.e., no protocol 686 mechanism exists for joining and pruning entire groups; only joins 687 and prunes for individual sources are available. If any such domain 688 desires to be able to serve as a transit domain, we require that some 689 form of Domain-Wide Reports (DWRs) [DWR] are available within such 690 domains. Such messages provide the ability to join and prune an 691 entire group across the domain. One simple heuristic to approximate 692 DWRs is to assume that if there are any internally-reached members, 694 Draft BGMP June 2003 696 then at least one of them is a sender. With this heuristic, the 697 presense of any M-IGP (S,G) state for internally-reached sources can 698 be used instead. Sending a data packet to a group is then equivalent 699 to sending a DWR for the group. 701 6.4.2. Interaction with PIM-SM 703 Protocols such as PIM-SM build unidirectional shared and source- 704 specific trees. As with DVMRP and PIM-DM, every data packet must 705 pass an RPF check against some group-specific or source-specific 706 address. 708 The fewest encapsulations/decapsulations will be done when the intra- 709 domain tree is rooted at the next-hop internal peer (which becomes 710 the RP) towards the nominal root of G, since in general that router 711 will receive the most packets from external sources. To achieve 712 this, each BGMP border router to a PIM-SM domain should send 713 Candidate-RP-Advertisements within the domain for those groups for 714 which it is the shared-domain tree ingress router. When the border 715 router that is the RP for a group G receives an external data packet, 716 it forwards the packet according to the M-IGP (i.e., PIM-SM) shared- 717 tree outgoing interface list. 719 Other border routers will receive data packets from external sources 720 that are farther down the bidirectional tree of domains. When a 721 border router that is not the RP receives an external packet for 722 which it does not have a source-specific entry, the border router 723 treats it like a local source by creating (S,G) state with a Register 724 flag set, based on normal PIM-SM rules; the Border router then 725 encapsulates the data packets in PIM-SM Registers and unicasts them 726 to the RP for the group. As explained above, the RP for the inter- 727 domain group will be one of the other border routers of the domain. 729 If a source's data rate is high enough, DRs within the PIM-SM domain 730 may switch to the shortest path tree. If the shortest path to an 731 external source is via the group's ingress router for the shared 732 tree, the new (S,G) state in the BGMP border router will not cause 733 BGMP (S,G) Joins because that border router will already have (*,G) 734 state. If however, the shortest path to an external source is via 735 some other border router, that border router will create (S,G) BGMP 736 state in response to the M-IGP (S,G) Join alert. In this case, 737 because there is no local (*,G) state to supress it, the border 738 router will send a BGMP (S,G) Join to the next-hop external peer 740 Draft BGMP June 2003 742 towards S, in order to pull the data down directly. (See BR11 in 743 Figure 1.) As in normal PIM-SM operation, those PIM-SM routers that 744 have (*,G) and (S,G) state pointing to different incoming interfaces 745 will prune that source off the shared tree. Therefore, all internal 746 interfaces may be eventually pruned off the internal shared tree. 748 After the border router sends a BGMP (S,G) Join, if its (S,G) state 749 has the PR-bit clear, a (S,G) Poison-Reverse message (with the PR-bit 750 clear) is sent to the ingress router for G. The ingress router then 751 creates (S,G) if it does not already exist, and removes the next hop 752 towards the nominal root of G from the target list. 754 If the border router later receives an (S,G) Poison-Reverse message 755 with the PR-bit set, the Poison-Reverse message is forwarded to the 756 ingress router for G. The best-exit router then creates (S,G) state 757 if it does not already exist, and puts the next hop towards the 758 nominal root of G in the target list if not already present. 760 6.4.3. Interaction with CBT 762 CBT builds bidirectional shared trees but must address two points of 763 compatibility with BGMP. First, CBT can not accommodate more than 764 one border router injecting a packet. Therefore, if a CBT domain 765 does have multiple external connections, the M-IGP components of the 766 border routers are responsible for insuring that only one of them 767 will inject data from any given source. 769 Second, CBT cannot process source-specific Joins or Prunes. Two 770 options thus exist for each CBT domain: 772 Option A: 773 The CBT component interprets a (S,G) Join alert as if it were an 774 (*,G) Join alert, as described in [INTEROP]. That is, if it is not 775 already on the core-tree for G, then it sends a CBT (*,G) JOIN- 776 REQUEST message towards the core for G. Similarly, when the CBT 777 component receives an (S,G) Prune alert, and the child interface 778 list for a group is NULL, then it sends a (*,G) QUIT_NOTIFICATION 779 towards the core for G. This option has the disadvantage of 780 pulling all data for the group G down to the CBT domain when no 781 members exist. 783 Option B: 784 The CBT domain does not propagate any source routes (i.e., non- 785 class D routes) to their external peers for the Multicast RIB 787 Draft BGMP June 2003 789 unless it is known that no other path exists to that prefix (e.g., 790 routes for prefixes internal to the domain or in a singly-homed 791 customer's domain may be propagated). This insures that source- 792 specific joins are never received unless the source's data already 793 passes through the domain on the shared tree, in which case the 794 (S,G) Join need not be propagated anyway. BGMP border routers will 795 only send source-specific Joins or Prunes to an external peer if 796 that external peer advertises source-prefixes in the EGP. If a 797 BGMP-CBT border router does receive an (S,G) Join or Prune, that 798 border router should ignore the message. 800 To minimize en/de-capsulations, CBTv2 BR's may follow the same 801 scheme as described under PIM-SM above, in which Candidate-Core 802 advertisements are sent for those groups for which it is the 803 shared-tree ingress router. 805 6.4.4. Interaction with MOSPF 807 As with CBTv2, MOSPF cannot process source-specific Joins or Prunes, 808 and the same two options are available. Therefore, an MOSPF domain 809 may either: 811 Option A: 812 send a Group-Membership-LSA for all of G in response to a (S,G) 813 Join alert, and "prematurely age" it out (when no other downstream 814 members exist) in response to an (S,G) Prune alert, OR 816 Option B: 817 not propagate any source routes (i.e., non-class D routes) to their 818 external peers for the Multicast RIB unless it is known that no 819 other path exists to that prefix (e.g., routes for prefixes 820 internal to the domain or in a singly-homed customer's domain may 821 be propagated) 823 6.5. Operation over Multi-access Networks 825 Multiaccess links require special handling to prevent duplicates. 826 The following mechanism enables BGMP to operate over multiaccess 827 links which do not run an M-IGP. This avoids broadcast-and-prune 828 behavior and does not require (S,G) state. 830 To elect a designated forwarder per prefix, BGMP uses a FWDR_PREF 831 message to exchange "forwarder preference" values for each prefix. 833 Draft BGMP June 2003 835 The peer with the highest forwarder preference becomes the designated 836 forwarder, with ties broken by lowest BGMP Identifier. The 837 designated forwarder is the router responsible for forwarding packets 838 up the tree, and is the peer to which joins will be sent. 840 When BGMP first learns that a route exists in the multicast RIB whose 841 next-hop interface is NOT the multiaccess link, the BGMP router sends 842 a BGMP FWDR_PREF message for the prefix, to all BGMP peers on the 843 LAN. The FWDR_PREF message contains a "forwarder preference value" 844 for the local router, and the same value MUST be sent to all peers on 845 the LAN. Likewise, when the prefix is no longer reachable, a 846 FWDR_PREF of 0 is sent to all peers on the LAN. 848 Whenever a BGMP router calculates the next-hop peer towards a 849 particular address, and that peer is reached over a BGMP-owned 850 multiaccess LAN, the designated forwarder is used instead. 852 When a BGMP router receives a FWDR_PREF message from a peer, it looks 853 up the matching route in its multicast RIB, and calculates the new 854 designated forwarder. If the router has tree state entries whose 855 parent target was the old forwarder, it sends Joins to the new 856 forwarder and Prunes to the old forwarder. 858 When a BGMP router which is NOT the designated forwarder receives a 859 packet on the multiaccess link, it is silently dropped. 861 Finally, this mechanism prevents duplicates where full peering exists 862 on a "logical" link. Where full peering does not exist, steps must 863 be taken (outside of BGMP) to present separate logical interfaces to 864 BGMP, each of which is a link with full peering. This might entail, 865 for example, using different link-layer address mappings, doing 866 encapsulation, or changing the physical media. 868 6.6. Interaction between (S,G) state and G-routes 870 As discussed earlier, routers with (*,G) state will not propagate (S,G) 871 joins. However, a special case occurs when (S,G) state coincides with 872 the G-route (or route towards the nominal root of G). When this occurs, 873 care must be taken so that the data will reach the root domain without 874 causing duplicates or black holes. For this reason, (S,G) state on the 875 path between the source and the root domain is annotated as being 876 "poison-reversed". A PR-bit is kept for this purpose, which is updated 877 by (UN)POISON_REVERSE messages. 879 The PR-bit indicates to BGMP nodes whether 880 they need to forward packets up towards the root domain. 881 For example, in a case where an (S,G) branch exists, 882 a transit domain may get packets along the (S,G) branch, 883 and needs to know whether to (also) forward them up 884 towards the root domain. If the domain in question 885 is on the path between S and the root domain, then 886 the answer is yes (and the PR bit will be set on the 887 S,G state). If the domain in question is not on the 888 path between S and the root domain, then the answer 889 is no (and the PR bit will be clear on the S,G state). 891 Draft BGMP June 2003 893 7. Message Formats 895 This section describes message formats used by BGMP. 897 Messages are sent over a reliable transport protocol connection. A 898 message is processed only after it is entirely received. The maximum 899 message size is 4096 octets. All implementations are required to 900 support this maximum message size. 902 All fields labelled "Reserved" below must be transmitted as 0, and 903 ignored upon receipt. 905 7.1. Message Header Format 907 Each message has a fixed-size (4-byte) header. There may or may not 908 be a data portion following the header, depending on the message 909 type. The layout of these fields is shown below: 911 0 1 2 3 912 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 913 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 914 | Length | Type | Reserved | 915 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 917 Length: 918 This 2-octet unsigned integer indicates the total length of the 919 message, including the header, in octets. Thus, e.g., it allows 920 one to locate in the transport-level stream the start of the next 921 message. The value of the Length field must always be at least 4 922 and no greater than 4096, and may be further constrained, depending 923 on the message type. No "padding" of extra data after the message 924 is allowed, so the Length field must have the smallest value 925 required given the rest of the message. 927 Type: 928 This 1-octet unsigned integer indicates the type code of the 929 message. The following type codes are defined: 931 1 - OPEN 932 2 - UPDATE 933 3 - NOTIFICATION 934 4 - KEEPALIVE 936 Draft BGMP June 2003 938 7.2. OPEN Message Format 940 After a transport protocol connection is established, the first 941 message sent by each side is an OPEN message. If the OPEN message is 942 acceptable, a KEEPALIVE message confirming the OPEN is sent back. 943 Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION 944 messages may be exchanged. 946 In addition to the fixed-size BGMP header, the OPEN message contains 947 the following fields: 949 0 1 2 3 950 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 951 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 952 | Version | Rsvd| AddrFam | Hold Time | 953 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 954 | BGMP Identifier (variable length) | 955 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 956 | | 957 + (Optional Parameters) | 958 | | 959 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 961 Version: 962 This 1-octet unsigned integer indicates the protocol version number 963 of the message. The current BGMP version number is 1. 965 AddrFam: 966 The IANA-assigned address family number of the BGMP Identifier. 967 These include (among others): 969 Number Description 970 ------ ----------- 971 1 IP (IP version 4) 972 2 IPv6 (IP version 6) 974 Hold Time: 975 This 2-octet unsigned integer indicates the number of seconds that 976 the sender proposes for the value of the Hold Timer. Upon receipt 977 of an OPEN message, a BGMP speaker MUST calculate the value of the 978 Hold Timer by using the smaller of its configured Hold Time and the 979 Hold Time received in the OPEN message. The Hold Time MUST be 981 Draft BGMP June 2003 983 either zero or at least three seconds. An implementation may 984 reject connections on the basis of the Hold Time. The calculated 985 value indicates the maximum number of seconds that may elapse 986 between the receipt of successive KEEPALIVE, and/or UPDATE messages 987 by the sender. 989 BGMP Identifier: 990 This 4-octet (for IPv4) or 16-octet (IPv6) unsigned integer 991 indicates the BGMP Identifier of the sender. A given BGMP speaker 992 sets the value of its BGMP Identifier to a globally-unique value 993 assigned to that BGMP speaker (e.g., an IPv4 address). The value 994 of the BGMP Identifier is determined on startup and is the same for 995 every BGMP session opened. 997 Optional Parameters: 998 This field may contain a list of optional parameters, where each 999 parameter is encoded as a triplet. The combined length of all optional 1001 parameters can be derived from the Length field in the message 1002 header. 1004 0 1 1005 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1006 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 1007 | Parm. Type | Parm. Length | Parameter Value (variable) 1008 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 1010 Parameter Type is a one octet field that unambiguously identifies 1011 individual parameters. Parameter Length is a one octet field that 1012 contains the length of the Parameter Value field in octets. 1013 Parameter Value is a variable length field that is interpreted 1014 according to the value of the Parameter Type field. 1016 This document defines the following Optional Parameters: 1018 a) Authentication Information (Parameter Type 1): 1019 This optional parameter may be used to authenticate a BGMP peer. 1020 The Parameter Value field contains a 1-octet Authentication Code 1021 followed by a variable length Authentication Data. 1023 0 1 2 3 4 5 6 7 8 1025 Draft BGMP June 2003 1027 +-+-+-+-+-+-+-+-+ 1028 | Auth. Code | 1029 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1030 | | 1031 | Authentication Data | 1032 | | 1033 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1035 Authentication Code: 1037 This 1-octet unsigned integer indicates the authentication 1038 mechanism being used. Whenever an authentication mechanism is 1039 specified for use within BGMP, three things must be included in 1040 the specification: 1042 - the value of the Authentication Code which indicates use of the 1043 mechanism, and - the form and meaning of the Authentication Data. 1045 Note that a separate authentication mechanism may be used in 1046 establishing the transport level connection. 1048 Authentication Data: 1050 The form and meaning of this field is a variable-length field 1051 depend on the Authentication Code. 1053 The minimum length of the OPEN message is 12 octets (including 1054 message header). 1056 b) Capability Information (Parameter Type 2): 1057 This is an Optional Parameter that is used by a BGMP-speaker to 1058 convey to its peer the list of capabilities supported by the 1059 speaker. The parameter contains one or more triples , where each triple is 1061 encoded as shown below: 1062 +------------------------------+ 1063 | Capability Code (1 octet) | 1064 +------------------------------+ 1065 | Capability Length (1 octet) | 1066 +------------------------------+ 1067 | Capability Value (variable) | 1068 +------------------------------+ 1069 Capability Code: 1071 Draft BGMP June 2003 1073 Capability Code is a one octet field that unambiguously identifies 1074 individual capabilities. 1076 Capability Length: 1078 Capability Length is a one octet field that contains the length of 1079 the Capability Value field in octets. 1081 Capability Value: 1083 Capability Value is a variable length field that is interpreted 1084 according to the value of the Capability Code field. 1086 A particular capability, as identified by its Capability Code, may 1087 occur more than once within the Optional Parameter. 1089 This document reserves Capability Codes 128-255 for vendor-specific 1090 applications. 1092 This document reserves value 0. 1094 Capability Codes (other than those reserved for vendor specific use) 1095 are assigned only by the IETF consensus process and IESG approval. 1097 7.3. UPDATE Message Format 1099 UPDATE messages are used to transfer Join/Prune/FwdrPref information 1100 between BGMP peers. The UPDATE message always includes the fixed- 1101 size BGMP header, and one or more attributes as described below. 1103 The message format below allows compact encoding of (*,G) Joins and 1104 Prunes, while allowing the flexibility needed to do other updates 1105 such as (S,G) Joins and Prunes towards sources as well as on the 1106 shared tree. In the discussion below, an Encoded-Address-Prefix is 1107 of the form: 1108 0 1 2 3 1109 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1110 +-+-+-+-+-+-+-+-+ 1111 |EnTyp| AddrFam | 1112 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1113 | Address (variable length) | 1114 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1115 | Mask (variable length) | 1117 Draft BGMP June 2003 1119 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1121 EnTyp: 1122 0 - All 1's Mask. The Mask field is 0 bytes long. 1123 1 - Mask length included. The Mask field is 4 bytes long, and 1124 contains the mask length, in bits. 1125 2 - Full Mask included. The Mask field is the same length 1126 as the Address field, and contains the full bitmask. 1128 AddrFam: 1129 The IANA-assigned address family number of the encoded prefix. 1131 Address: 1132 The address associated with the given prefix to be encoded. The 1133 length is determined based on the Address Family. 1135 Mask: 1136 The mask associated with the given prefix. The format (or absence) 1137 of this field is determined by the EnTyp field. 1139 Each attribute is of the form: 1141 0 1 2 3 1142 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1143 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1144 | Length | Type | Data ... 1145 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1146 All attributes are 4-byte aligned. 1148 Length: 1149 The Length is the length of the entire attribute, including the 1150 length, type, and data fields. If other attributes are nested 1151 within the data field, the length includes the size of all such 1152 nested attributes. 1154 Type: 1156 Types 128-255 are reserved for "optional" attributes. If a 1157 required attribute is unrecognized, a NOTIFICATION will be sent and 1158 the connection will be closed if the error is a fatal one. 1159 Unrecognized optional attributes are simply ignored. 1161 0 - JOIN 1163 Draft BGMP June 2003 1165 1 - PRUNE 1166 2 - GROUP 1167 3 - SOURCE 1168 4 - FWDR_PREF 1169 5 - POISON_REVERSE 1171 a) JOIN (Type Code 0) 1173 The JOIN attribute indicates that all GROUP or SOURCE options 1174 nested immediately within the JOIN option should be joined. 1176 0 1 2 3 1177 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1178 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1179 | Length | Type=0 | Reserved | 1180 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1181 | Nested Attributes ... 1182 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1183 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1184 within a JOIN attribute. 1186 b) PRUNE (Type Code 1) 1188 The PRUNE attribute indicates that all GROUP or SOURCE attributes 1189 nested immediately within the PRUNE attribute should be pruned. 1191 0 1 2 3 1192 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1193 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1194 | Length | Type=1 | Reserved | 1195 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1196 | Nested Attributes ... 1197 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1198 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1199 within a PRUNE attribute. 1201 c) GROUP (Type Code 2) 1203 The GROUP attribute identifies a given group-prefix. In addition, 1204 any attributes nested immediately within the GROUP attribute also 1205 apply to the given group-prefix. 1207 0 1 2 3 1208 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1209 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1211 Draft BGMP June 2003 1213 | Length | Type=2 | | 1214 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1215 | | 1216 | Encoded-Address-Prefix | 1217 | | 1218 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1219 | Nested Attributes (optional) ... 1220 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1221 Encoded-Address-Prefix 1222 The multicast group prefix to be joined to 1223 pruned, 1224 in the format described above. 1225 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1226 be 1227 immediately nested within a GROUP attribute. 1229 d) SOURCE (Type Code 3): 1231 The SOURCE attribute identifies a given source-prefix. In 1232 addition, any attributes nested immediately within the SOURCE 1233 attribute also apply to the given source-prefix. 1235 The SOURCE attribute has the following format: 1237 0 1 2 3 1238 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1239 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1240 | Length | Type=2 | | 1241 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1242 | | 1243 | Encoded-Address-Prefix | 1244 | | 1245 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1246 | Nested Attributes (optional) ... 1247 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1248 Encoded-Address-Prefix 1249 The Source-prefix in the format described 1250 above. 1251 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1252 be 1253 immediately nested within a SOURCE attribute. 1255 e) FWDR_PREF (Type Code 4) 1257 The FWDR_PREF attribute provides a forwarder preference value for 1259 Draft BGMP June 2003 1261 all GROUP or SOURCE attributes nested immediately within the 1262 FWDR_PREF attribute. It is used by a BGMP speaker to inform other 1263 BGMP speakers of the originating speaker's degree of preference for 1264 a given group or source prefix. Usage of this attribute is 1265 described in 5.5. 1267 0 1 2 3 1268 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1270 | Length | Type=1 | Reserved | 1271 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1272 | Preference Value | 1273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1274 | Nested Attributes ... 1275 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1276 Preference Value A 32-bit non-negative integer. 1277 Nested Attributes No JOIN, PRUNE, or FWDR_PREF attributes may be 1278 immediately nested within a FWDR_PREF 1279 attribute. 1281 e) POISON_REVERSE (Type Code 5) 1283 The POISON_REVERSE attribute provides a "poison-reverse" (PR-bit) 1284 value for all SOURCE attributes nested immediately within the 1285 POISON_REVERSE attribute. It is used by a BGMP speaker to inform 1286 other BGMP speakers from which it has received (S,G) Joins that 1287 they are on the path of domains between the source and the root 1288 domain. 1290 0 1 2 3 1291 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1292 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1293 | Length | Type=1 | Reserved |P| 1294 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1295 | Nested Attributes ... 1296 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1297 P The PR-bit value. 1298 Nested Attributes No attribues in the document other than SOURCE 1299 may be immediately nested within a 1300 POISON_REVERSE 1301 attribute. 1303 Draft BGMP June 2003 1305 7.4. Encoding examples 1307 Below are enumerated examples of how various updates are built using 1308 nested attributes, where A ( B ) denotes that attribute B is nested 1309 within attribute A. 1310 (*,G-prefix) Join: JOIN ( GROUP ) 1311 (*,G-prefix) Prune: PRUNE ( GROUP ) 1312 (S,G) Join towards S : GROUP ( JOIN ( SOURCE ) ) 1313 (S,G) Join cancelling prune towards root of G: GROUP ( JOIN ( SOURCE ) ) 1314 (S,G) Prune towards S: GROUP ( PRUNE ( SOURCE ) ) 1315 (S,G) Prune towards root of G: GROUP ( PRUNE ( SOURCE ) ) 1316 Switch from (*,G) to (S,G): PRUNE ( GROUP ( JOIN ( SOURCE ) ) ) 1317 Switch from (S,G) to (*,G): JOIN ( GROUP ) 1318 Initial (*,G) Join with S pruned: JOIN ( GROUP ( PRUNE ( SOURCE ) ) ) 1319 Forwarder preference announcement for G-prefix: FWDR_PREF ( GROUP ) 1320 Forwarder preference announcement for S-prefix: FWDR_PREF ( SOURCE ) 1322 7.5. KEEPALIVE Message Format 1324 BGMP does not use any transport protocol-based keep-alive mechanism 1325 to determine if peers are reachable. Instead, KEEPALIVE messages are 1326 exchanged between peers often enough as not to cause the Hold Timer 1327 to expire. A reasonable maximum time between the last KEEPALIVE or 1328 UPDATE message sent, and the time at which a KEEPALIVE message is 1329 sent, would be one third of the Hold Time interval. KEEPALIVE 1330 messages MUST NOT be sent more frequently than one per second. An 1331 implementation MAY adjust the rate at which it sends KEEPALIVE 1332 messages as a function of the Hold Time interval. 1334 If the negotiated Hold Time interval is zero, then periodic KEEPALIVE 1335 messages MUST NOT be sent. 1337 A KEEPALIVE message consists of only a message header, and has a 1338 length of 4 octets. 1340 7.6. NOTIFICATION Message Format 1342 A NOTIFICATION message is sent when an error condition is detected. 1343 The BGMP connection is closed immediately after sending it if the 1344 error is a fatal one. 1346 In addition to the fixed-size BGMP header, the NOTIFICATION message 1347 contains the following fields: 1349 Draft BGMP June 2003 1351 0 1 2 3 1352 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1353 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1354 |O| Error code | Error subcode | Data | 1355 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1356 | | 1357 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1359 O-bit: 1360 Open-bit. If clear, the connection will be closed. 1361 If set, indicates the error is not fatal. 1363 Error Code: 1365 This 1-octet unsigned integer indicates the type of 1366 NOTIFICATION. The following Error Codes have been defined: 1368 Error Code Symbolic Name Reference 1370 1 Message Header Error Section 9.1 1372 2 OPEN Message Error Section 9.2 1374 3 UPDATE Message Error Section 9.3 1376 4 Hold Timer Expired Section 9.5 1378 5 Finite State Machine Error Section 9.6 1380 6 Cease Section 9.7 1382 Error subcode: 1384 This 1-octet unsigned integer provides more specific 1385 information about the nature of the reported error. Each 1386 Error 1387 Code may have one or more Error Subcodes associated with it. 1388 If no appropriate Error Subcode is defined, then a zero 1389 (Unspecific) value is used for the Error Subcode field. 1390 The notation (MC) below indicates the error is a fatal one 1391 and the O-bit must be clear. Non-fatal subcodes SHOULD 1392 be sent with the O-bit set. 1394 Message Header Error subcodes: 1396 Draft BGMP June 2003 1398 2 - Bad Message Length (MC) 1399 3 - Bad Message Type (MC) 1401 OPEN Message Error subcodes: 1403 1 - Unsupported Version (MC) 1404 4 - Unsupported Optional Parameter 1405 5 - Authentication Failure (MC) 1406 6 - Unacceptable Hold Time (MC) 1407 7 - Unsupported Capability (MC) 1409 UPDATE Message Error subcodes: 1411 1 - Malformed Attribute List (MC) 1412 2 - Unrecognized Attribute Type 1413 5 - Attribute Length Error (MC) 1414 10 - Invalid Address 1415 11 - Invalid Mask 1416 13 - Unrecognized Address Family 1417 Data: 1418 This variable-length field is used to diagnose the reason for the 1419 NOTIFICATION. The contents of the Data field depend upon the 1420 Error Code and Error Subcode. See Section 9 below for more 1421 details. 1423 Note that the length of the Data field can be determined from the 1424 message Length field by the formula: 1426 Message Length = 6 + Data Length 1428 The minimum length of the NOTIFICATION message is 6 octets 1429 (including message header). 1431 Draft BGMP June 2003 1433 8. BGMP Error Handling 1435 This section describes actions to be taken when errors are detected 1436 while processing BGMP messages. BGMP Error Handling is similar to 1437 that of BGP [BGP]. 1439 When any of the conditions described here are detected, a 1440 NOTIFICATION message with the indicated Error Code, Error Subcode, 1441 and Data fields is sent, and the BGMP connection is closed if the 1442 error is a fatal one. If no Error Subcode is specified, then a zero 1443 must be used. 1445 The phrase "the BGMP connection is closed" means that the transport 1446 protocol connection has been closed and that all resources for that 1447 BGMP connection have been deallocated. The remote peer is removed 1448 from the target list of all tree state entries. 1450 Unless specified explicitly, the Data field of the NOTIFICATION 1451 message that is sent to indicate an error is empty. 1453 8.1. Message Header error handling 1455 All errors detected while processing the Message Header are indicated 1456 by sending the NOTIFICATION message with Error Code Message Header 1457 Error. The Error Subcode elaborates on the specific nature of the 1458 error. 1460 If the Length field of the message header is less than 4 or greater 1461 than 4096, or if the Length field of an OPEN message is less than 1462 the minimum length of the OPEN message, or if the Length field of an 1463 UPDATE message is less than the minimum length of the UPDATE message, 1464 or if the Length field of a KEEPALIVE message is not equal to 4, then 1465 the Error Subcode is set to Bad Message Length. The Data field 1466 contains the erroneous Length field. 1468 If the Type field of the message header is not recognized, then the 1469 Error Subcode is set to Bad Message Type. The Data field contains 1470 the erroneous Type field. 1472 8.2. OPEN message error handling 1474 All errors detected while processing the OPEN message are indicated 1475 by sending the NOTIFICATION message with Error Code OPEN Message 1477 Draft BGMP June 2003 1479 Error. The Error Subcode elaborates on the specific nature of the 1480 error. 1482 If the version number contained in the Version field of the received 1483 OPEN message is not supported, then the Error Subcode is set to 1484 Unsupported Version Number. The Data field is a 2-octet unsigned 1485 integer, which indicates the largest locally supported version number 1486 less than the version the remote BGMP peer bid (as indicated in the 1487 received OPEN message). 1489 If the Hold Time field of the OPEN message is unacceptable, then the 1490 Error Subcode MUST be set to Unacceptable Hold Time. An 1491 implementation MUST reject Hold Time values of one or two seconds. 1492 An implementation MAY reject any proposed Hold Time. An 1493 implementation which accepts a Hold Time MUST use the negotiated 1494 value for the Hold Time. 1496 If one of the Optional Parameters in the OPEN message is not 1497 recognized, then the Error Subcode is set to Unsupported Optional 1498 Parameters. 1500 If the OPEN message carries Authentication Information (as an 1501 Optional Parameter), then the corresponding authentication procedure 1502 is invoked. If the authentication procedure (based on Authentication 1503 Code and Authentication Data) fails, then the Error Subcode is set to 1504 Authentication Failure. 1506 If the OPEN message indicates that the peer does not support a 1507 capability which the receiver requires, the receiver may send a 1508 NOTIFICATION message to the peer, and terminate peering. The Error 1509 Subcode in the message is set to Unsupported Capability. The Data 1510 field in the NOTIFICATION message lists the set of capabilities that 1511 cause the speaker to send the message. Each such capability is 1512 encoded the same way as it was encoded in the received OPEN message. 1514 8.3. UPDATE message error handling 1516 All errors detected while processing the UPDATE message are indicated 1517 by sending the NOTIFICATION message with Error Code UPDATE Message 1518 Error. The error subcode elaborates on the specific nature of the 1519 error. 1521 Draft BGMP June 2003 1523 If any recognized attribute has Attribute Length that conflicts with 1524 the expected length (based on the attribute type code), then the 1525 Error Subcode is set to Attribute Length Error. The Data field 1526 contains the erroneous attribute (type, length and value). 1528 If the Encoded-Address-Prefix field in some attribute is 1529 syntactically incorrect, then the Error Subcode is set to Invalid 1530 Prefix Field. 1532 If any other is encountered when processing attributes (such as 1533 invalid nestings), then the Error Subcode is set to Malformed 1534 Attribute List, and the problematic attribute is included in the data 1535 field. 1537 8.4. NOTIFICATION message error handling 1539 If a peer sends a NOTIFICATION message, and there is an error in that 1540 message, there is unfortunately no means of reporting this error via 1541 a subsequent NOTIFICATION message. Any such error, such as an 1542 unrecognized Error Code or Error Subcode, should be noticed, logged 1543 locally, and brought to the attention of the administration of the 1544 peer. The means to do this, however, lies outside the scope of this 1545 document. 1547 8.5. Hold Timer Expired error handling 1549 If a system does not receive successive KEEPALIVE and/or UPDATE 1550 and/or NOTIFICATION messages within the period specified in the Hold 1551 Time field of the OPEN message, then the NOTIFICATION message with 1552 Hold Timer Expired Error Code must be sent and the BGMP connection 1553 closed. 1555 8.6. Finite State Machine error handling 1557 Any error detected by the BGMP Finite State Machine (e.g., receipt of 1558 an unexpected event) is indicated by sending the NOTIFICATION message 1559 with Error Code Finite State Machine Error. 1561 Draft BGMP June 2003 1563 8.7. Cease 1565 In absence of any fatal errors (that are indicated in this section), 1566 a BGMP peer may choose at any given time to close its BGMP connection 1567 by sending the NOTIFICATION message with Error Code Cease. However, 1568 the Cease NOTIFICATION message must not be used when a fatal error 1569 indicated by this section does exist. 1571 8.8. Connection collision detection 1573 If a pair of BGMP speakers try simultaneously to establish a TCP 1574 connection to each other, then two parallel connections between this 1575 pair of speakers might well be formed. We refer to this situation as 1576 connection collision. Clearly, one of these connections must be 1577 closed. 1579 Based on the value of the BGMP Identifier a convention is established 1580 for detecting which BGMP connection is to be preserved when a 1581 collision does occur. The convention is to compare the BGMP 1582 Identifiers of the peers involved in the collision and to retain only 1583 the connection initiated by the BGMP speaker with the higher-valued 1584 BGMP Identifier. 1586 Upon receipt of an OPEN message, the local system must examine all of 1587 its connections that are in the OpenConfirm state. A BGMP speaker 1588 may also examine connections in an OpenSent state if it knows the 1589 BGMP Identifier of the peer by means outside of the protocol. If 1590 among these connections there is a connection to a remote BGMP 1591 speaker whose BGMP Identifier equals the one in the OPEN message, 1592 then the local system performs the following collision resolution 1593 procedure: 1595 1. The BGMP Identifier of the local system is compared to the BGMP 1596 Identifier of the remote system (as specified in the OPEN message). 1598 2. If the value of the local BGMP Identifier is less than the remote 1599 one, the local system closes BGMP connection that already exists (the 1600 one that is already in the OpenConfirm state), and accepts BGMP 1601 connection initiated by the remote system. 1603 3. Otherwise, the local system closes newly created BGMP connection 1604 (the one associated with the newly received OPEN message), and 1605 continues to use the existing one (the one that is already in the 1606 OpenConfirm state). 1608 Draft BGMP June 2003 1610 Comparing BGMP Identifiers is done by treating them as (4-octet long) 1611 unsigned integers. 1613 A connection collision with an existing BGMP connection that is in 1614 Established states causes unconditional closing of the newly created 1615 connection. Note that a connection collision cannot be detected with 1616 connections that are in Idle, or Connect, or Active states. 1618 Closing the BGMP connection (that results from the collision 1619 resolution procedure) is accomplished by sending the NOTIFICATION 1620 message with the Error Code Cease. 1622 9. BGMP Version Negotiation 1624 BGMP speakers may negotiate the version of the protocol by making 1625 multiple attempts to open a BGMP connection, starting with the 1626 highest version number each supports. If an open attempt fails with 1627 an Error Code OPEN Message Error, and an Error Subcode Unsupported 1628 Version Number, then the BGMP speaker has available the version 1629 number it tried, the version number its peer tried, the version 1630 number passed by its peer in the NOTIFICATION message, and the 1631 version numbers that it supports. If the two peers do support one or 1632 more common versions, then this will allow them to rapidly determine 1633 the highest common version. In order to support BGMP version 1634 negotiation, future versions of BGMP must retain the format of the 1635 OPEN and NOTIFICATION messages. 1637 9.1. BGMP Capability Negotiation 1639 When a BGMP speaker sends an OPEN message to its BGMP peer, the 1640 message may include an Optional Parameter, called Capabilities. The 1641 parameter lists the capabilities supported by the speaker. 1643 A BGMP speaker may use a particular capability when peering with 1644 another speaker only if both speakers support that capability. A 1645 BGMP speaker determines the capabilities supported by its peer by 1646 examining the list of capabilities present in the Capabilities 1647 Optional Parameter carried by the OPEN message that the speaker 1648 receives from the peer. 1650 Draft BGMP June 2003 1652 10. BGMP Finite State machine 1654 This section specifies BGMP operation in terms of a Finite State 1655 Machine (FSM). Following is a brief summary and overview of BGMP 1656 operations by state as determined by this FSM. 1658 Initially BGMP is in the Idle state. 1660 Idle state: 1662 In this state BGMP refuses all incoming BGMP connections. No 1663 resources are allocated to the peer. In response to the Start 1664 event (initiated by either system or operator) the local system 1665 initializes all BGMP resources, starts the ConnectRetry timer, 1666 initiates a transport connection to the other BGMP peer, while 1667 listening for a connection that may be initiated by the remote 1668 BGMP peer, and changes its state to Connect. The exact value of 1669 the ConnectRetry timer is a local matter, but should be 1670 sufficiently large to allow TCP initialization. 1672 If a BGMP speaker detects an error, it shuts down the connection 1673 and changes its state to Idle. Getting out of the Idle state 1674 requires generation of the Start event. If such an event is 1675 generated automatically, then persistent BGMP errors may result in 1676 persistent flapping of the speaker. To avoid such a condition it 1677 is recommended that Start events should not be generated 1678 immediately for a peer that was previously transitioned to Idle 1679 due to an error. For a peer that was previously transitioned to 1680 Idle due to an error, the time between consecutive generation of 1681 Start events, if such events are generated automatically, shall 1682 exponentially increase. The value of the initial timer shall be 60 1683 seconds. The time shall be doubled for each consecutive retry. 1685 Any other event received in the Idle state is ignored. 1687 Connect state: 1689 In this state BGMP is waiting for the transport protocol 1690 connection to be completed. 1692 If the transport protocol connection succeeds, the local system 1693 clears the ConnectRetry timer, completes initialization, sends an 1694 OPEN message to its peer, and changes its state to OpenSent. If 1695 the transport protocol connect fails (e.g., retransmission 1696 timeout), the local system restarts the ConnectRetry timer, 1698 Draft BGMP June 2003 1700 continues to listen for a connection that may be initiated by the 1701 remote BGMP peer, and changes its state to Active state. 1703 In response to the ConnectRetry timer expired event, the local 1704 system restarts the ConnectRetry timer, initiates a transport 1705 connection to the other BGMP peer, continues to listen for a 1706 connection that may be initiated by the remote BGMP peer, and 1707 stays in the Connect state. 1709 The Start event is ignored in the Connect state. 1711 In response to any other event (initiated by either system or 1712 operator), the local system releases all BGMP resources associated 1713 with this connection and changes its state to Idle. 1715 Active state: 1717 In this state BGMP is trying to acquire a peer by listening for an 1718 incoming transport protocol connection. 1720 If the transport protocol connection succeeds, the local system 1721 clears the ConnectRetry timer, completes initialization, sends an 1722 OPEN message to its peer, sets its Hold Timer to a large value, 1723 and changes its state to OpenSent. A Hold Timer value of 4 1724 minutes is suggested. 1726 In response to the ConnectRetry timer expired event, the local 1727 system restarts the ConnectRetry timer, initiates a transport 1728 connection to other BGMP peer, continues to listen for a 1729 connection that may be initiated by the remote BGMP peer, and 1730 changes its state to Connect. 1732 If the local system detects that a remote peer is trying to 1733 establish BGMP connection to it, and the IP address of the remote 1734 peer is not an expected one, the local system restarts the 1735 ConnectRetry timer, rejects the attempted connection, continues to 1736 listen for a connection that may be initiated by the remote BGMP 1737 peer, and stays in the Active state. 1739 The Start event is ignored in the Active state. 1741 In response to any other event (initiated by either system or 1742 operator), the local system releases all BGMP resources associated 1743 with this connection and changes its state to Idle. 1745 Draft BGMP June 2003 1747 OpenSent state: 1749 In this state BGMP waits for an OPEN message from its peer. When 1750 an OPEN message is received, all fields are checked for 1751 correctness. If the BGMP message header checking or OPEN message 1752 checking detects an error (see Section 6.2), or a connection 1753 collision (see Section 6.8) the local system sends a NOTIFICATION 1754 message and changes its state to Idle. 1756 If there are no errors in the OPEN message, BGMP sends a KEEPALIVE 1757 message and sets a KeepAlive timer. The Hold Timer, which was 1758 originally set to a large value (see above), is replaced with the 1759 negotiated Hold Time value (see section 4.2). If the negotiated 1760 Hold Time value is zero, then the Hold Time timer and KeepAlive 1761 timers are not started. If the value of the Autonomous System 1762 field is the same as the local Autonomous System number, then the 1763 connection is an "internal" connection; otherwise, it is 1764 "external". Finally, the state is changed to OpenConfirm. 1766 If a disconnect notification is received from the underlying 1767 transport protocol, the local system closes the BGMP connection, 1768 restarts the ConnectRetry timer, while continue listening for 1769 connection that may be initiated by the remote BGMP peer, and goes 1770 into the Active state. 1772 If the Hold Timer expires, the local system sends NOTIFICATION 1773 message with error code Hold Timer Expired and changes its state 1774 to Idle. 1776 In response to the Stop event (initiated by either system or 1777 operator) the local system sends NOTIFICATION message with Error 1778 Code Cease and changes its state to Idle. 1780 The Start event is ignored in the OpenSent state. 1782 In response to any other event the local system sends NOTIFICATION 1783 message with Error Code Finite State Machine Error and changes its 1784 state to Idle. 1786 Whenever BGMP changes its state from OpenSent to Idle, it closes 1787 the BGMP (and transport-level) connection and releases all 1788 resources associated with that connection. 1790 OpenConfirm state: 1792 Draft BGMP June 2003 1794 In this state BGMP waits for a KEEPALIVE or NOTIFICATION message. 1796 If the local system receives a KEEPALIVE message, it changes its 1797 state to Established. 1799 If the Hold Timer expires before a KEEPALIVE message is received, 1800 the local system sends NOTIFICATION message with error code Hold 1801 Timer Expired and changes its state to Idle. 1803 If the local system receives a NOTIFICATION message, it changes 1804 its state to Idle. 1806 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1807 message and restarts its KeepAlive timer. 1809 If a disconnect notification is received from the underlying 1810 transport protocol, the local system changes its state to Idle. 1812 In response to the Stop event (initiated by either system or 1813 operator) the local system sends NOTIFICATION message with Error 1814 Code Cease and changes its state to Idle. 1816 The Start event is ignored in the OpenConfirm state. 1818 In response to any other event the local system sends NOTIFICATION 1819 message with Error Code Finite State Machine Error and changes its 1820 state to Idle. 1822 Whenever BGMP changes its state from OpenConfirm to Idle, it 1823 closes the BGMP (and transport-level) connection and releases all 1824 resources associated with that connection. 1826 Established state: 1828 In the Established state BGMP can exchange UPDATE, NOTIFICATION, 1829 and KEEPALIVE messages with its peer. 1831 If the local system receives an UPDATE or KEEPALIVE message, it 1832 restarts its Hold Timer, if the negotiated Hold Time value is non- 1833 zero. 1835 If the local system receives a NOTIFICATION message, it changes 1836 its state to Idle. 1838 If the local system receives an UPDATE message and the UPDATE 1840 Draft BGMP June 2003 1842 message error handling procedure (see Section 6.3) detects an 1843 error, the local system sends a NOTIFICATION message and changes 1844 its state to Idle. 1846 If a disconnect notification is received from the underlying 1847 transport protocol, the local system changes its state to Idle. 1849 If the Hold Timer expires, the local system sends a NOTIFICATION 1850 message with Error Code Hold Timer Expired and changes its state 1851 to Idle. 1853 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1854 message and restarts its KeepAlive timer. 1856 Each time the local system sends a KEEPALIVE or UPDATE message, it 1857 restarts its KeepAlive timer, unless the negotiated Hold Time 1858 value is zero. 1860 In response to the Stop event (initiated by either system or 1861 operator), the local system sends a NOTIFICATION message with 1862 Error Code Cease and changes its state to Idle. 1864 The Start event is ignored in the Established state. 1866 In response to any other event, the local system sends 1867 NOTIFICATION message with Error Code Finite State Machine Error 1868 and changes its state to Idle. 1870 Whenever BGMP changes its state from Established to Idle, it 1871 closes the BGMP (and transport-level) connection, releases all 1872 resources associated with that connection, and deletes all routes 1873 derived from that connection. 1875 11. Security Considerations 1877 BGMP uses TCP sessions for all network communication between peers. TCP 1878 sessions may be secured through the use of IPsec [IPSEC]. 1880 12. Authors' Addresses 1882 Dave Thaler 1883 Microsoft 1885 Draft BGMP June 2003 1887 One Microsoft Way 1888 Redmond, WA 98052 1889 EMail: dthaler@microsoft.com 1891 13. Normative References 1893 [INTEROP] 1894 Thaler, D., "Interoperability Rules for Multicast Routing 1895 Protocols", RFC 2715, October 1999. 1897 [IPSEC] 1898 Kent, S., and R. Atkinson, "Security Architecture for the Internet 1899 Protocol", RFC 2401, November 1998. 1901 [RFC2119] 1902 S. Bradner, "Key words for use in RFCs to Indicate Requirement 1903 Levels", BCP 14, RFC 2119, March 1997. 1905 [V4PREFIX] 1906 D. Thaler, "Unicast-Prefix-based IPv4 Multicast Addresses", draft- 1907 ietf-mboned-ipv4-uni-based-mcast-00.txt, Work in progress, June 1908 2002. 1910 [V6PREFIX] 1911 Haberman, B., and D. Thaler, "Unicast-Prefix-based IPv6 Multicast 1912 Addresses", RFC 3306, August 2002. 1914 14. Non-normative References 1916 [BGP] 1917 Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1918 1771, March 1995. 1920 [MBGP] 1921 Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol 1922 Extensions for BGP-4", RFC 2858, June 2000. 1924 [CBT] 1925 Ballardie, A., "Core Based Trees (CBT version 2) Multicast 1926 Routing", RFC 2189, September 1997. 1928 Draft BGMP June 2003 1930 [DVMRP] 1931 Pusateri, T., "Distance Vector Multicast Routing Protocol", draft- 1932 ietf-idmr-dvmrp-v3-10.txt, Work in progress, August 2000. 1934 [DWR] 1935 Fenner, W., "Domain-Wide Reports", draft-ietf-idmr-membership- 1936 reports-04.txt, Work in progress, August 1999. 1938 [IPv6AA] 1939 Hinden, R., and S. Deering, "IP Version 6 Addressing Architecture", 1940 RFC 3513, April 2003. 1942 [MOSPF] 1943 Moy, J., "Multicast Extensions to OSPF", RFC 1584, Proteon, March 1944 1994. 1946 [PIMDM] 1947 Adams, A., Nicholas, J., and W. Siadak, "Protocol Independent 1948 Multicast - Dense Mode (PIM-DM): Protocol Specification (Revised)", 1949 draft-ietf-pim-dm-new-v2-01.txt, Work in progress, February 2002. 1951 [PIMSM] 1952 Fenner, et al., "Protocol Independent Multicast-Sparse Mode (PIM- 1953 SM): Protocol Specification (Revised)", draft-ietf-pim-sm-v2-new-07.txt, 1954 March 2003. 1956 [REFLECT] 1957 Bates, T., and R. Chandra, "BGP Route Reflection: An alternative to 1958 full mesh IBGP", RFC 1966, June 1996. 1960 15. Full Copyright Statement 1962 Copyright (C) The Internet Society (2003). All Rights Reserved. 1964 This document and translations of it may be copied and furnished to 1965 others, and derivative works that comment on or otherwise explain it or 1966 assist in its implementation may be prepared, copied, published and 1967 distributed, in whole or in part, without restriction of any kind, 1968 provided that the above copyright notice and this paragraph are included 1969 on all such copies and derivative works. However, this document itself 1970 may not be modified in any way, such as by removing the copyright notice 1971 or references to the Internet Society or other Internet organizations, 1972 except as needed for the purpose of developing Internet standards in 1973 which case the procedures for copyrights defined in the Internet 1974 Draft BGMP June 2003 1976 languages other than English. 1978 The limited permissions granted above are perpetual and will not be 1979 revoked by the Internet Society or its successors or assigns. 1981 This document and the information contained herein is provided on an "AS 1982 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 1983 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 1984 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 1985 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 1986 FITNESS FOR A PARTICULAR PURPOSE. 1988 Table of Contents 1990 xp t