idnits 2.17.1 draft-ietf-bgmp-spec-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 474 has weird spacing: '... target list ...' == Line 1457 has weird spacing: '...is less than...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2004) is 7187 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BR32' is mentioned on line 273, but not defined == Missing Reference: 'BR41' is mentioned on line 273, but not defined == Missing Reference: 'BR31' is mentioned on line 275, but not defined == Missing Reference: 'BR42' is mentioned on line 275, but not defined == Missing Reference: 'BR43' is mentioned on line 275, but not defined == Missing Reference: 'BR22' is mentioned on line 277, but not defined == Missing Reference: 'BR52' is mentioned on line 277, but not defined == Missing Reference: 'BR53' is mentioned on line 277, but not defined == Missing Reference: 'BR21' is mentioned on line 279, but not defined == Missing Reference: 'BR51' is mentioned on line 279, but not defined == Missing Reference: 'BR12' is mentioned on line 281, but not defined == Missing Reference: 'BR61' is mentioned on line 281, but not defined == Missing Reference: 'BR13' is mentioned on line 283, but not defined == Missing Reference: 'BR71' is mentioned on line 287, but not defined == Missing Reference: 'BR81' is mentioned on line 287, but not defined == Missing Reference: 'BRXY' is mentioned on line 291, but not defined == Missing Reference: 'PIM-SM' is mentioned on line 311, but not defined == Missing Reference: 'RFC2385' is mentioned on line 1878, but not defined ** Obsolete undefined reference: RFC 2385 (Obsoleted by RFC 5925) == Unused Reference: 'IPSEC' is defined on line 1907, but no explicit reference was found in the text == Unused Reference: 'DVMRP' is defined on line 1935, but no explicit reference was found in the text == Unused Reference: 'MOSPF' is defined on line 1943, but no explicit reference was found in the text == Unused Reference: 'PIMDM' is defined on line 1947, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 2715 (ref. 'INTEROP') ** Obsolete normative reference: RFC 2401 (ref. 'IPSEC') (Obsoleted by RFC 4301) -- Obsolete informational reference (is this intentional?): RFC 1771 (ref. 'BGP') (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 2283 (ref. 'MBGP') (Obsoleted by RFC 2858) -- Obsolete informational reference (is this intentional?): RFC 2373 (ref. 'IPv6AA') (Obsoleted by RFC 3513) == Outdated reference: A later version (-05) exists of draft-ietf-pim-dm-new-v2-04 -- Obsolete informational reference (is this intentional?): RFC 2362 (ref. 'PIMSM') (Obsoleted by RFC 4601, RFC 5059) -- Obsolete informational reference (is this intentional?): RFC 1966 (ref. 'REFLECT') (Obsoleted by RFC 4456) Summary: 6 errors (**), 0 flaws (~~), 28 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BGMP Working Group D. Thaler 3 Internet Engineering Task Force Microsoft 4 INTERNET-DRAFT 19 January 2004 5 Expires July 2004 7 Border Gateway Multicast Protocol (BGMP): 8 Protocol Specification 9 11 Status of this Memo 13 This document is an Internet-Draft and is in full conformance with all 14 provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering Task 17 Force (IETF), its areas, and its working groups. Note that other groups 18 may also distribute working documents as Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference material 23 or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/1id-abstracts.html 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html 31 Abstract 33 This document describes BGMP, a protocol for inter-domain multicast 34 routing. BGMP builds shared trees for active multicast groups, and 35 optionally allows receiver domains to build source-specific, inter- 36 domain, distribution branches where needed. BGMP natively supports 37 Draft BGMP January 2004 39 "source-specific multicast" (SSM). To also support "any-source 40 multicast" (ASM), BGMP requires that each multicast group be associated 41 with a single root (in BGMP it is referred to as the root domain). It 42 requires that different ranges of the multicast address space are 43 associated (e.g., with Unicast-Prefix-Based Multicast addressing) with 44 different domains. Each of these domains then becomes the root of the 45 shared domain-trees for all groups in its range. Multicast participants 46 will generally receive better multicast service if the session 47 initiator's address allocator selects addresses from its own domain's 48 part of the space, thereby causing the root domain to be local to at 49 least one of the session participants. 51 Copyright Notice 53 Copyright (C) The Internet Society (2004). All Rights Reserved. 55 1. Acknowledgements 57 In addition to the editors, the following individuals have 58 contributed to the design of BGMP: Cengiz Alaettinoglu, Tony 59 Ballardie, Steve Casner, Steve Deering, Deborah Estrin, Dino 60 Farinacci, Bill Fenner, Mark Handley, Ahmed Helmy, Van Jacobson, Dave 61 Meyer, and Satish Kumar. 63 This document is the product of the IETF BGMP Working Group with Dave 64 Thaler as editor. 66 Rusty Eddy, Isidor Kouvelas, and Pavlin Radoslavov also provided 67 valuable feedback on this document. 69 2. Purpose 71 It has been suggested that inter-domain "any-source" multicast is 72 better supported with a rendezvous mechanism whereby members receive 73 source's data packets without any sort of global broadcast (e.g., 74 MSDP broadcasts source information, PIM-DM and DVMRP broadcast 75 initial data packets, and MOSPF broadcasts membership information). 76 PIM-SM [PIMSM] and CBT [CBT] use a shared group-tree, to which all 77 members join and thereby hear from all sources (and to which non- 78 members do not join and thereby hear from no sources). 80 This document describes BGMP, a protocol for inter-domain multicast 81 routing. BGMP natively supports "source-specific multicast" (SSM). 83 Draft BGMP January 2004 85 To also support "any-source multicast" (ASM), BGMP builds shared 86 trees for active multicast groups, and allows domains to build 87 source-specific, inter-domain, distribution branches where needed. 88 Building upon concepts from PIM-SM and CBT, BGMP requires that each 89 global multicast group be associated with a single root. However, in 90 BGMP, the root is an entire exchange or domain, rather than a single 91 router. 93 For non-source-specific groups, BGMP assumes that ranges of the 94 multicast address space have been associated (e.g., with Unicast- 95 Prefix-Based Multicast [V4PREFIX,V6PREFIX] addressing) with selected 96 domains. Each such domain then becomes the root of the shared 97 domain-trees for all groups in its range. An address allocator will 98 generally achieve better distribution trees if it takes its multicast 99 addresses from its own domain's part of the space, thereby causing 100 the root domain to be local. 102 BGMP uses TCP as its transport protocol. This eliminates the need to 103 implement message fragmentation, retransmission, acknowledgement, and 104 sequencing. BGMP uses TCP port 264 for establishing its connections. 105 This port is distinct from BGP's port to provide protocol 106 independence, and to facilitate distinguishing between protocol 107 packets (e.g., by packet classifiers, diagnostic utilities, etc.) 109 Two BGMP peers form a TCP connection between one another, and 110 exchange messages to open and confirm the connection parameters. 111 They then send incremental Join/Prune Updates as group memberships 112 change. BGMP does not require periodic refresh of individual 113 entries. KeepAlive messages are sent periodically to ensure the 114 liveness of the connection. Notification messages are sent in 115 response to errors or special conditions. If a connection encounters 116 an error condition, a notification message is sent and the connection 117 is closed if the error is a fatal one. 119 3. Revision History 121 This section should be removed by the RFC editor before publication. 123 19 January 2004 draft-06 125 (1) Removed NOTE from abstract per mailing list discussion about it 126 being obsolete. 128 Draft BGMP January 2004 130 Replaced "class D" terminology with non-IPv4-specific 131 terminology. 133 Added security considerations, with language similar to that in 134 RFC 3618. 136 Removed mention of Domain-Wide Reports since draft no longer 137 exists, and there is no current demand for using dense-mode 138 protocols as transit domains between sparse-mode domains. 140 Updated references. 142 Moved [V4PREFIX] from normative to informative, since the 143 protocol can be implemented without it (but without it, the ASM 144 model will only work in IPv6). 146 29 June 2002 draft-01 148 (1) Removed all references to MASC and G-RIB. The current spec only 149 covers BGMP operation for source-specific groups, and any-source- 150 multicast using unicast prefix-based multicast addresses (for 151 both IPv4 and IPv6). No new routes of any type are needed in the 152 routing table. 154 (2) Removed section on transitioning away from using DVMRP as the 155 backbone to an AS-based multicast routing system with MBGP, as 156 this has already happened. 158 4. Terminology 160 This document uses the following technical terms: 162 Domain: 163 A set of one or more contiguous links and zero or more routers 164 surrounded by one or more multicast border routers. Note that this 165 loose definition of domain also applies to an external link between 166 two domains, as well as an exchange. 168 Root Domain: 169 When constructing a shared tree of domains for some group, one 170 domain will be the "root" of the tree. The root domain receives 171 data from each sender to the group, and functions as a rendezvous 172 domain toward which member domains can send inter-domain joins, and 173 to which sender domains can send data. 175 Draft BGMP January 2004 177 Multicast RIB: 178 The Routing Information Base, or routing table, used to calculate 179 the "next-hop" towards a particular address for multicast traffic. 181 Multicast IGP (M-IGP): 182 A generic term for any multicast routing protocol used for tree 183 construction within a domain. Typical examples of M-IGPs are: PIM- 184 SM, PIM-DM, DVMRP, MOSPF, and CBT. 186 EGP: A generic term for the interdomain unicast routing protocol in use. 187 Typically, this will be some version of BGP which can support a 188 Multicast RIB, such as MBGP [MBGP], containing both unicast and 189 multicast address prefixes. 191 Component: 192 The portion of a border router associated with (and logically 193 inside) a particular domain that runs the multicast IGP (M-IGP) for 194 that domain, if any. Each border router thus has zero or more 195 components inside routing domains. In addition, each border router 196 with external links that do not fall inside any routing domain will 197 have an inter-domain component that runs BGMP. 199 External peer: 200 A border router in another multicast AS (autonomous system, as used 201 in BGP), to which a BGMP TCP-connection is open. If BGP is being 202 used as the EGP, a separate "eBGP" TCP-connection will also be open 203 to the same peer. 205 Internal peer: 206 Another border router of the same multicast AS. If BGP is being 207 used as the EGP, the border router either speaks iBGP ("internal" 208 BGP) directly to internal peers in a full mesh, or indirectly 209 through a route reflector [REFLECT]. 211 Next-hop peer: 212 The next-hop peer towards a given IP address is the next EGP router 213 on the path to the given address, according to multicast RIB routes 214 in the EGP's routing table (e.g., in MBGP, routes whose Subsequent 215 Address Family Identifier field indicates that the route is valid 216 for multicast traffic). 218 target: 219 Either an EGP peer, or an M-IGP component. 221 Draft BGMP January 2004 223 Tree State Table: 224 This is a table of (S-prefix,G) and (*,G-prefix) entries that have 225 been explicitly joined by a set of targets. Each entry has, in 226 addition to the source and group addresses and masks, a list of 227 targets that have explicitly requested data (on behalf of directly 228 connected hosts or downstream routers). (S,G) entries also have an 229 "SPT" bit. 231 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in 232 this document are to be interpreted as described in [RFC2119]. 234 5. Protocol Overview 236 BGMP maintains group-prefix state in response to messages from BGMP 237 peers and notifications from M-IGP components. Group-shared trees 238 are rooted at the domain advertising the group prefix covering those 239 groups. When a receiver joins a specific group address, the border 240 router towards the root domain generates a group-specific Join 241 message, which is then forwarded Border-Router-by-Border-Router 242 towards the root domain (see Figure 1). BGMP Join and Prune messages 243 are sent over TCP connections between BGMP peers, and BGMP protocol 244 state is refreshed by KEEPALIVE messages periodically sent over TCP. 246 BGMP routers build group-specific bidirectional forwarding state as 247 they process the BGMP Join messages. Bidirectional forwarding state 248 means that packets received from any target are forwarded to all 249 other targets in the target list without any RPF checks. No group- 250 specific state or traffic exists in parts of the network where there 251 are no members of that group. 253 BGMP routers optionally build source-specific unidirectional 254 forwarding state, only where needed, to be compatible with source- 255 specific trees (SPTs) used by some M-IGPs (e.g., DVMRP, PIM-DM, or 256 PIM-SM), or to construct trees for source-specific groups. A domain 257 that uses an SPT-based M-IGP may need to inject multicast packets 258 from external sources via different border routers (to be compatible 259 with the M-IGP RPF checks) which thus act as "surrogates". For 260 example, in the Transit_1 domain, data from Src_A arrives at BR12, 261 but must be injected by BR11. A surrogate router may create a 262 source-specific BGMP branch if no shared tree state exists. Note: 263 stub domains with a single border router, such as Rcvr_Stub_7 in 264 Figure 1, receive all multicast data packets through that router, to 265 which all RPF checks point. Therefore, stub domains never build 266 source-specific state. 268 Draft BGMP January 2004 270 Root_Domain 271 [BR91]--------------------------\ 272 | | 273 [BR32] [BR41] 274 Transit_3 Transit_4 275 [BR31] [BR42] [BR43] 276 | | | 277 [BR22] [BR52] [BR53] 278 Transit_2 Transit_5 279 [BR21] [BR51] 280 | | 281 [BR12] [BR61] 282 Transit_1[BR11]----------[BR62]Stub_6 283 [BR13] (Src_A) 284 | (Rcvr_D) 285 ------------------- 286 | | 287 [BR71] [BR81] 288 Rcvr_Stub_7 Src_only_Stub_8 289 (Rcvr_C) (Src_B) 291 Figure 1: Example inter-domain topology. [BRXY] represents a BGMP 292 border 293 router. Transit_X is a transit domain network. *_Stub_X is a stub 294 domain network. 296 Data packets are forwarded based on a combination of BGMP and M-IGP 297 rules. The router forwards to a set of targets according to a 298 matching (S,G) BGMP tree state entry if it exists. If not found, the 299 router checks for a matching (*,G) BGMP tree state entry. If neither 300 is found, then the packet is sent natively to the next-hop EGP peer 301 for G, according to the Multicast RIB (for example, in the case of a 302 non-member sender such as Src_B in Figure 1). If a matching entry 303 was found, the packet is forwarded to all other targets in the target 304 list. In this way BGMP trees forward data in a bidirectional manner. 305 If a target is an M-IGP component then forwarding is subject to the 306 rules of that M-IGP protocol. 308 5.1. Design Rationale 310 Several other protocols, or protocol proposals, build shared trees 311 within domains [PIM-SM, CBT]. The design choices made for BGMP 312 result from our focus on Inter-Domain multicast in particular. The 314 Draft BGMP January 2004 316 design choices made by PIM-SM and CBT are better suited to the wide- 317 area intra-domain case. There are three major differences between 318 BGMP and other shared-tree protocols: 320 (1) Unidirectional vs. Bidirectional trees 322 Bidirectional trees (using bidirectional forwarding state as 323 described above) minimize third party dependence which is essential 324 in the inter-domain context. For example, in Figure 1, stub domains 325 7 and 8 would like to exchange multicast packets without being 326 dependent on the quality of connectivity of the root domain. 327 However, unidirectional shared trees (i.e., those using RPF checks) 328 have more aggressive loop prevention and share the same processing 329 rules as source-specific entries which are inherently unidirectional. 331 The lack of third party dependence concerns in the INTRA domain case 332 reduces the incentive to employ bidirectional trees. BGMP supports 333 bidirectional trees because it has to, and because it can without 334 excessive cost. 336 (2) Source-specific distribution trees/branches 338 In a departure from other shared tree protocols, source-specific BGMP 339 state is built ONLY where (a) it is needed to pull the multicast 340 traffic down to a BGMP router that has source-specific (S,G) state, 341 and (b) that router is NOT already on the shared tree (i.e., has no 342 (*,G) state), and (c) that router does not want to receive packets 343 via encapsulation from a router which is on the shared tree. BGMP 344 provides source-specific branches because most M-IGP protocols in use 345 today build source-specific trees. BGMP's source-specific branches 346 eliminate the unnecessary overhead of encapsulations for high data 347 rate sources from the shared tree's ingress router to the surrogate 348 injector (e.g. from BR12 to BR11 in Figure 1). Moreover, cases in 349 which shared paths are significantly longer than SPT paths will also 350 benefit. 352 However, except for source-specific group distribution trees, we do 353 not build source-specific inter-domain trees in general because (a) 354 inter-domain connectivity is generally less rich than intra-domain 355 connectivity, so shared distribution trees should have more 356 acceptible path length and traffic concentration properties in the 357 inter-domain context, than in the intra-domain case, and (b) by 358 having the shared tree state always take precedence over source- 359 specific tree state, we avoid ambiguities that can otherwise arise. 361 Draft BGMP January 2004 363 In summary, BGMP trees are, in a sense, a hybrid between PIM-SM and 364 CBT trees. 366 (3) Method of choosing root of group shared tree 368 The choice of a group's shared-tree-root has implications for 369 performance and policy. In the intra-domain case it is sometimes 370 assumed that all potential shared-tree roots (RPs/Cores) within the 371 domain are equally suited to be the root for a group that is 372 initiated within that domain. In the INTER-domain case, there is far 373 more opportunity for unacceptably poor locality, and administrative 374 control of a group's shared-tree root. Therefore in the intra-domain 375 case, other protocols sometimes treat all candidate roots (RPs or 376 Cores) as equivalent and emphasize load sharing and stability to 377 maximize performance. In the Inter-Domain case, all roots are not 378 equivalent, and we adopt an approach whereby a group's root domain is 379 not random but is subject to administrative control. 381 6. Protocol Details 383 In this section, we describe the detailed protocol that border 384 routers perform. We assume that each border router conforms to the 385 component-based model described in [INTEROP], modulo one correction 386 to section 3.2 ("BGMP" Dispatcher), as follows: 388 The iif owner of a (*,G) entry is the component owning the next-hop 389 interface towards the nominal root of G, in the multicast RIB. 391 6.1. Interaction with the EGP 393 The fundamental requirements imposed by BGMP are that: 395 (1) For a given source-specific group and source, BGMP must be able 396 to look up the next-hop towards the source in the Multicast RIB, 397 and 399 (2) For a given non-source-specific group, BGMP will map the group 400 address to a nominal "root" address, and must be able to look up 401 the next-hop towards that address in the Multicast RIB. 403 BGMP determines the nominal "root" address as follows. If the multicast 404 address is a Unicast-Prefix-based Multicast address, then the nominal 405 root address is the embedded unicast prefix, padded with a suffix of 0 406 Draft BGMP January 2004 408 bits to form a full address. 410 For example, if the IPv6 group address is 411 ff2e:0100:1234:5678:9abc:def0::123, then the unicast prefix is 412 1234:5678:9abc:def0/64, and the nominal root address would be 413 1234:5678:9abc:def0::. (This address is in fact the subnet router 414 anycast address [IPv6AA].) 416 Support for any-source-multicast using any address other than a Unicast- 417 prefix-based Multicast Address is outside the scope of this document. 419 6.2. Multicast Data Packet Processing 421 For BGMP rules to be applied, an incoming packet must first be 422 "accepted": 424 o If the packet arrived on an interface owned by an M-IGP, the M-IGP 425 component determines whether the packet should be accepted or 426 dropped according to its rules. If the packet is accepted, the 427 packet is forwarded (or not forwarded) out any other interfaces 428 owned by the same component, as specified by the M-IGP. 430 o If the packet was received over a point-to-point interface owned 431 by BGMP, the packet is accepted. 433 o If the packet arrived on a multiaccess network interface owned by 434 BGMP, the packet is accepted if it is receiving data on a source- 435 specific branch, if it is the designated forwarder for the longest 436 matching route for S, or for the longest matching route for the 437 nominal root of G. 439 If the packet is accepted, then the router checks the tree state 440 table for a matching (S,G) entry. If one is found, but the packet 441 was not received from the next hop target towards S (if the entry's 442 SPT bit is True), or was not received from the next hop target 443 towards G (if the entry's SPT bit is False) then the packet is 444 dropped and no further actions are taken. If no (S,G) entry was 445 found, the router then checks for a matching (*,G) entry. 447 If neither is found, then the packet is forwarded towards the next- 448 hop peer for the nominal root of G, according to the Multicast RIB. 449 If a matching entry was found, the packet is forwarded to all other 450 targets in the target list. 452 Draft BGMP January 2004 454 Forwarding to a target which is an M-IGP component means that the 455 packet is forwarded out any interfaces owned by that component 456 according to that component's multicast forwarding rules. 458 6.3. BGMP processing of Join and Prune messages and notifications 460 6.3.1. Receiving Joins 462 When the BGMP component receives a (*,G) or (S,G) Join alert from 463 another component, or a BGMP (S,G) or (*,G) Join message from an 464 external peer, it searches the tree state table for a matching entry. 465 If an entry is found, and that peer is already listed in the target 466 list, then no further actions are taken. 468 Otherwise, if no (*,G) or (S,G) entry was found, one is created. In 469 the case of a (*,G), the target list is initialized to contain the 470 next-hop peer towards the nominal root of G, if it is an external 471 peer. If the peer is internal, the target list is initialized to 472 contain the M-IGP component owning the next-hop interface. If there 473 is no next-hop peer (because the nominal root of G is inside the 474 domain), then the target list is initialized to contain the next-hop 475 component. If an (S,G) entry exists for the same G for which the 476 (*,G) Join is being processed, and the next-hop peers toward S and 477 the nominal root of G are different, the BGMP router must first send 478 a (S,G) Prune message toward the source and clear the SPT bit on the 479 (S,G) entry, before activating the (*,G) entry. 481 When creating (S,G) state, if the source is internal to the BGMP 482 speaker's domain, a "Poison-Reverse" bit (PR-bit) is set. This bit 483 indicates that the router may receive packets matching (S,G) anyway 484 due to the BGMP speaker being a member of a domain on the path 485 between S and the root domain. (Depending on the M-IGP protocol, it 486 may in fact receive such packets anyway only if it is the best exit 487 for the nominal root of G.) 489 The target from which the Join was received is then added to the 490 target list. The router then looks up S or the nominal root of G in 491 the Multicast RIB to find the next-hop EGP peer. If the target list, 492 not including the next-hop target towards G for a (*,G) entry, 493 becomes non-null as a result, the next-hop EGP peer must be notified 494 as follows: 496 a) If the next-hop peer towards the nominal root of G (for a (*,G) 497 entry) is an external peer, a BGMP (*,G) Join message is unicast 499 Draft BGMP January 2004 501 to the external peer. If the next-hop peer towards S (for an 502 (S,G) entry) is an external peer, and the router does NOT have any 503 active (*,G) state for that group address G, a BGMP (S,G) Join 504 message is unicast to the external peer. A BGMP (S,G) Join 505 message is never sent to an external peer by a router that also 506 contains active (*,G) state for the same group. If the next-hop 507 peer towards S (for an (S,G entry) is an external peer and the 508 router DOES have active (*,G) state for that group G, the SPT bit 509 is always set to False. 511 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Join 512 alert is sent to the M-IGP component owning the next-hop 513 interface. 515 c) If there is no next-hop peer, a (*,G) or (S,G) Join alert is sent 516 to the M-IGP component owning the next-hop interface. 518 Finally, if an (S,G) Join is received from an internal peer, the 519 peer should be stored with the M-IGP component target. If (S,G) 520 state exists with the PR-bit set, and the next-hop towards the 521 nominal root for G is through the M-IGP component, an (S,G) 522 Poison-Reverse message is immediately sent to the internal peer. 524 If an (S,G) Join is received from an external peer, and (S,G) 525 state exists with the PR-bit set, and the local BGMP speaker is 526 the best exit for the nominal root of G, and the next-hop towards 527 the nominal root for G is through the interface towards the 528 external peer, an (S,G) Poison-Reverse message is immediately sent 529 to the external peer. 531 6.3.2. Receiving Prune Notifications 533 When the BGMP component receives a (*,G) or (S,G) Prune alert from 534 another component, or a BGMP (*,G) or (S,G) Prune message from an 535 external peer, it searches the tree state table for a matching entry. 536 If no (S,G) entry was found for an (S,G) Prune, but (*,G) state 537 exists, an (S,G) entry is created, with the target list copied from 538 the (*,G) entry. If no matching entry exists, or if the component or 539 peer is not listed in the target list, no further actions are taken. 541 Otherwise, the component or peer is removed from the target list. If 542 the target list becomes null as a result, the next-hop peer towards 543 the nominal root of G (for a (*,G) entry), or towards S (for an (S,G) 545 Draft BGMP January 2004 547 entry if and only if the BGMP router does NOT have any corresponding 548 (*,G) entry), must be notified as follows. 550 a) If the peer is an external peer, a BGMP (*,G) or (S,G) Prune 551 message is unicast to it. 553 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Prune 554 alert is sent to the M-IGP component owning the next-hop 555 interface. 557 c) If there is no next-hop peer, a (*,G) or (S,G) Prune alert is sent 558 to the M-IGP component owning the next-hop interface. 560 6.3.3. Receiving Route Change Notifications 562 When a border router receives a route for a new prefix in the 563 multicast RIB, or a existing route for a prefix is withdrawn, a route 564 change notification for that prefix must be sent to the BGMP 565 component. In addition, when the next hop peer (according to the 566 multicast RIB) changes, a route change notification for that prefix 567 must be sent to the BGMP component. 569 In addition, in IPv4 (only), an internal route for each class-D 570 prefix associated with the domain (if any) MUST be injected into the 571 multicast RIB in the EGP by the domain's border routers. 573 When a route for a new group prefix is learned, or an existing route 574 for a group prefix is withdrawn, or the next-hop peer for a group 575 prefix changes, a BGMP router updates all affected (*,G) target 576 lists. The router sends a (*,G) Join to the new next-hop target, and 577 a (*,G) Prune to the old next-hop target, as appropriate. In 578 addition, if any (S,G) state exists with the PR-bit set: 580 o If the BGMP speaker has just become the best exit for the nominal 581 root of G, an (S,G) Poison Reverse message with the PR-bit set is 582 sent as noted below. 584 o If the BGMP speaker was the best exit for the nominal root of G 585 and is no longer, an (S,G) Poison Reverse message with the PR-bit 586 clear is sent as noted below. 587 The (S,G) Poison-Reverse messages are sent to all external peers on 588 the next-hop interface towards the nominal root of G from which (S,G) 589 Joins have been received. 591 Draft BGMP January 2004 593 When an existing route for a source prefix is withdrawn, or the next- 594 hop peer for a source prefix changes, a BGMP router updates all 595 affected (S,G) target lists. The router sends a (S,G) Join to the 596 new next-hop target, and a (S,G) Prune to the old next-hop target, as 597 appropriate. 599 6.3.4. Receiving (S,G) Poison-Reverse messages 601 When a BGMP speaker receives an (S,G) Poison-Reverse message from a 602 peer, it sets the PR-bit on the (S,G) state to match the PR-bit in 603 the message, and looks up the next-hop towards the nominal root of G. 604 If the next-hop target is an M-IGP component, it forwards the (S,G) 605 Poison Reverse message to all internal peers of that component from 606 which it has received (S,G) Joins. If the next-hop target is an 607 external peer on a given interface, it forwards the (S,G) Poison 608 Reverse message to all external peers on that interface. 610 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 611 external peer, with the PR-bit set, and the speaker has received no 612 (S,G) Joins from any other peers (e.g., only from the M-IGP, or has 613 (S,G) state due to encapsulation as described in 5.4.1), it knows 614 that its own (S,G) Join is unnecessary, and should send an (S,G) 615 Prune. 617 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 618 internal peer, with the PR-bit set, and the speaker is the best exit 619 for the nominal root of G, and has (S,G) prune state, an (S,G) Join 620 message is sent to cancel the prune state and the state is deleted. 622 6.4. Interaction with M-IGP components 624 When an M-IGP component on a border router first learns that there 625 are internally-reached members for a group G (whose scope is larger 626 than that domain), a (*,G) Join alert is sent to the BGMP component. 627 Similarly, when an M-IGP component on a border router learns that 628 there are no longer internally-reached members for a group G (whose 629 scope is larger than a single domain), a (*,G) Prune alert is sent to 630 the BGMP component. 632 At any time, any M-IGP domain MAY decide to join a source-specific 633 branch for some external source S and group G. When the M-IGP 634 component in the border router that is the next-hop router for a 635 particular source S learns that a receiver wishes to receive data 637 Draft BGMP January 2004 639 from S on a source-specific path, an (S,G) Join alert is sent to the 640 BGMP component. When it is learned that such receivers no longer 641 exist, an (S,G) Prune alert is sent to the BGMP component. Recall 642 that the BGMP component will generate external source-specific Joins 643 only where the source-specific branch does not coincide with the 644 shared tree distribution tree for that group. 646 Finally, we will require that the border router that is the next-hop 647 internal peer for a particular address S or the nominal root of G be 648 able to forward data for a matching tree state table entry to all 649 members within the domain. This requirement has implications on 650 specific M-IGPs as follows. 652 6.4.1. Interaction with DVMRP and PIM-DM 654 DVMRP and PIM-DM are both "broadcast and prune" protocols in which 655 every data packet must pass an RPF check against the packet's source 656 address, or be dropped. If the border router receiving packets from 657 an external source is the only BR to inject the route for the source 658 into the domain, then there are no problems. For example, this will 659 always be true for stub domains with a single border router (see 660 Figure 1). Otherwise, the border router receiving packets externally 661 is responsible for encapsulating the data to any other border routers 662 that must inject the data into the domain for RPF checks to succeed. 664 When an intended border router injector for a source receives 665 encapsulated packets from another border router in its domain, it 666 should create source-specific (S,G) BGMP state. Note that the border 667 router may be configured to do this on a data-rate triggered basis so 668 that the state is not created for very low data-rate/intermittent 669 sources. If source-specific state is created, then its incoming 670 interface points to the virtual encapsulation interface from the 671 border router that forwarded the packet, and it has an SPT flag that 672 is initialized to be False. 674 When the (S,G) BGMP state is created, the BGMP component will in turn 675 send a BGMP (S,G) Join message to the next-hop external peer towards 676 S if there is no (*,G) state for that same group, G. The (S,G) BGMP 677 state will have the SPT bit set to False if (*,G) BGMP state is 678 present. 680 When the first data packet from S arrives from the external peer and 681 matches on the BGMP (S,G) state, and IF there is no (*,G) state, the 682 router sets the SPT flag to True, resets the incoming interface to 684 Draft BGMP January 2004 686 point to the external peer, and sends a BGMP (S,G) Prune message to 687 the border router that was encapsulating the packets (e.g., in Figure 688 1, BR11 sends the (Src_A,G) Prune to BR12). When the border router 689 with (*,G) state receives the prune for (S,G), it then deletes that 690 border router from its list of targets. 692 If the decapsulator receives a (S,G) Poison Reverse message with the 693 PR-bit set, it will forward it to the encapsulator (which may again 694 forward it up the shared tree according to normal BGMP rules), and 695 both will delete their BGMP (S,G) state. 697 PIM-DM and DVMRP present an additional problem, i.e., no protocol 698 mechanism exists for joining and pruning entire groups; only joins 699 and prunes for individual sources are available. As a result, BGMP 700 does not currently support such protocols being used in a transit 701 domain. 703 6.4.2. Interaction with PIM-SM 705 Protocols such as PIM-SM build unidirectional shared and source- 706 specific trees. As with DVMRP and PIM-DM, every data packet must 707 pass an RPF check against some group-specific or source-specific 708 address. 710 The fewest encapsulations/decapsulations will be done when the intra- 711 domain tree is rooted at the next-hop internal peer (which becomes 712 the RP) towards the nominal root of G, since in general that router 713 will receive the most packets from external sources. To achieve 714 this, each BGMP border router to a PIM-SM domain should send 715 Candidate-RP-Advertisements within the domain for those groups for 716 which it is the shared-domain tree ingress router. When the border 717 router that is the RP for a group G receives an external data packet, 718 it forwards the packet according to the M-IGP (i.e., PIM-SM) shared- 719 tree outgoing interface list. 721 Other border routers will receive data packets from external sources 722 that are farther down the bidirectional tree of domains. When a 723 border router that is not the RP receives an external packet for 724 which it does not have a source-specific entry, the border router 725 treats it like a local source by creating (S,G) state with a Register 726 flag set, based on normal PIM-SM rules; the Border router then 727 encapsulates the data packets in PIM-SM Registers and unicasts them 728 to the RP for the group. As explained above, the RP for the inter- 730 Draft BGMP January 2004 732 domain group will be one of the other border routers of the domain. 734 If a source's data rate is high enough, DRs within the PIM-SM domain 735 may switch to the shortest path tree. If the shortest path to an 736 external source is via the group's ingress router for the shared 737 tree, the new (S,G) state in the BGMP border router will not cause 738 BGMP (S,G) Joins because that border router will already have (*,G) 739 state. If however, the shortest path to an external source is via 740 some other border router, that border router will create (S,G) BGMP 741 state in response to the M-IGP (S,G) Join alert. In this case, 742 because there is no local (*,G) state to supress it, the border 743 router will send a BGMP (S,G) Join to the next-hop external peer 744 towards S, in order to pull the data down directly. (See BR11 in 745 Figure 1.) As in normal PIM-SM operation, those PIM-SM routers that 746 have (*,G) and (S,G) state pointing to different incoming interfaces 747 will prune that source off the shared tree. Therefore, all internal 748 interfaces may be eventually pruned off the internal shared tree. 750 After the border router sends a BGMP (S,G) Join, if its (S,G) state 751 has the PR-bit clear, a (S,G) Poison-Reverse message (with the PR-bit 752 clear) is sent to the ingress router for G. The ingress router then 753 creates (S,G) if it does not already exist, and removes the next hop 754 towards the nominal root of G from the target list. 756 If the border router later receives an (S,G) Poison-Reverse message 757 with the PR-bit set, the Poison-Reverse message is forwarded to the 758 ingress router for G. The best-exit router then creates (S,G) state 759 if it does not already exist, and puts the next hop towards the 760 nominal root of G in the target list if not already present. 762 6.4.3. Interaction with CBT 764 CBT builds bidirectional shared trees but must address two points of 765 compatibility with BGMP. First, CBT can not accommodate more than 766 one border router injecting a packet. Therefore, if a CBT domain 767 does have multiple external connections, the M-IGP components of the 768 border routers are responsible for insuring that only one of them 769 will inject data from any given source. 771 Second, CBT cannot process source-specific Joins or Prunes. Two 772 options thus exist for each CBT domain: 774 Option A: 775 The CBT component interprets a (S,G) Join alert as if it were an 777 Draft BGMP January 2004 779 (*,G) Join alert, as described in [INTEROP]. That is, if it is not 780 already on the core-tree for G, then it sends a CBT (*,G) JOIN- 781 REQUEST message towards the core for G. Similarly, when the CBT 782 component receives an (S,G) Prune alert, and the child interface 783 list for a group is NULL, then it sends a (*,G) QUIT_NOTIFICATION 784 towards the core for G. This option has the disadvantage of 785 pulling all data for the group G down to the CBT domain when no 786 members exist. 788 Option B: 789 The CBT domain does not propagate any routes to their external 790 peers for the Multicast RIB unless it is known that no other path 791 exists to that prefix (e.g., routes for prefixes internal to the 792 domain or in a singly-homed customer's domain may be propagated). 793 This insures that source-specific joins are never received unless 794 the source's data already passes through the domain on the shared 795 tree, in which case the (S,G) Join need not be propagated anyway. 796 BGMP border routers will only send source-specific Joins or Prunes 797 to an external peer if that external peer advertises source- 798 prefixes in the EGP. If a BGMP-CBT border router does receive an 799 (S,G) Join or Prune, that border router should ignore the message. 801 To minimize en/de-capsulations, CBTv2 BR's may follow the same 802 scheme as described under PIM-SM above, in which Candidate-Core 803 advertisements are sent for those groups for which it is the 804 shared-tree ingress router. 806 6.4.4. Interaction with MOSPF 808 As with CBTv2, MOSPF cannot process source-specific Joins or Prunes, 809 and the same two options are available. Therefore, an MOSPF domain 810 may either: 812 Option A: 813 send a Group-Membership-LSA for all of G in response to a (S,G) 814 Join alert, and "prematurely age" it out (when no other downstream 815 members exist) in response to an (S,G) Prune alert, OR 817 Option B: 818 not propagate any routes to their external peers for the Multicast 819 RIB unless it is known that no other path exists to that prefix 820 (e.g., routes for prefixes internal to the domain or in a singly- 821 homed customer's domain may be propagated) 823 Draft BGMP January 2004 825 6.5. Operation over Multi-access Networks 827 Multiaccess links require special handling to prevent duplicates. 828 The following mechanism enables BGMP to operate over multiaccess 829 links which do not run an M-IGP. This avoids broadcast-and-prune 830 behavior and does not require (S,G) state. 832 To elect a designated forwarder per prefix, BGMP uses a FWDR_PREF 833 message to exchange "forwarder preference" values for each prefix. 834 The peer with the highest forwarder preference becomes the designated 835 forwarder, with ties broken by lowest BGMP Identifier. The 836 designated forwarder is the router responsible for forwarding packets 837 up the tree, and is the peer to which joins will be sent. 839 When BGMP first learns that a route exists in the multicast RIB whose 840 next-hop interface is NOT the multiaccess link, the BGMP router sends 841 a BGMP FWDR_PREF message for the prefix, to all BGMP peers on the 842 LAN. The FWDR_PREF message contains a "forwarder preference value" 843 for the local router, and the same value MUST be sent to all peers on 844 the LAN. Likewise, when the prefix is no longer reachable, a 845 FWDR_PREF of 0 is sent to all peers on the LAN. 847 Whenever a BGMP router calculates the next-hop peer towards a 848 particular address, and that peer is reached over a BGMP-owned 849 multiaccess LAN, the designated forwarder is used instead. 851 When a BGMP router receives a FWDR_PREF message from a peer, it looks 852 up the matching route in its multicast RIB, and calculates the new 853 designated forwarder. If the router has tree state entries whose 854 parent target was the old forwarder, it sends Joins to the new 855 forwarder and Prunes to the old forwarder. 857 When a BGMP router which is NOT the designated forwarder receives a 858 packet on the multiaccess link, it is silently dropped. 860 Finally, this mechanism prevents duplicates where full peering exists 861 on a "logical" link. Where full peering does not exist, steps must 862 be taken (outside of BGMP) to present separate logical interfaces to 863 BGMP, each of which is a link with full peering. This might entail, 864 for example, using different link-layer address mappings, doing 865 encapsulation, or changing the physical media. 867 Draft BGMP January 2004 869 6.6. Interaction between (S,G) state and G-routes 871 As discussed earlier, routers with (*,G) state will not propagate (S,G) 872 joins. However, a special case occurs when (S,G) state coincides with 873 the G-route (or route towards the nominal root of G). When this occurs, 874 care must be taken so that the data will reach the root domain without 875 causing duplicates or black holes. For this reason, (S,G) state on the 876 path between the source and the root domain is annotated as being 877 "poison-reversed". A PR-bit is kept for this purpose, which is updated 878 by (UN)POISON_REVERSE messages. 880 The PR-bit indicates to BGMP nodes whether they need to forward packets 881 up towards the root domain. For example, in a case where an (S,G) 882 branch exists, a transit domain may get packets along the (S,G) branch, 883 and needs to know whether to (also) forward them up towards the root 884 domain. If the domain in question is on the path between S and the root 885 domain, then the answer is yes (and the PR bit will be set on the S,G 886 state). If the domain in question is not on the path between S and the 887 root domain, then the answer is no (and the PR bit will be clear on the 888 S,G state). 890 7. Message Formats 892 This section describes message formats used by BGMP. 894 Messages are sent over a reliable transport protocol connection. A 895 message is processed only after it is entirely received. The maximum 896 message size is 4096 octets. All implementations are required to 897 support this maximum message size. 899 All fields labelled "Reserved" below must be transmitted as 0, and 900 ignored upon receipt. 902 7.1. Message Header Format 904 Each message has a fixed-size (4-byte) header. There may or may not 905 be a data portion following the header, depending on the message 906 type. The layout of these fields is shown below: 908 0 1 2 3 909 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 910 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 911 | Length | Type | Reserved | 913 Draft BGMP January 2004 915 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 917 Length: 918 This 2-octet unsigned integer indicates the total length of the 919 message, including the header, in octets. Thus, e.g., it allows 920 one to locate in the transport-level stream the start of the next 921 message. The value of the Length field must always be at least 4 922 and no greater than 4096, and may be further constrained, depending 923 on the message type. No "padding" of extra data after the message 924 is allowed, so the Length field must have the smallest value 925 required given the rest of the message. 927 Type: 928 This 1-octet unsigned integer indicates the type code of the 929 message. The following type codes are defined: 931 1 - OPEN 932 2 - UPDATE 933 3 - NOTIFICATION 934 4 - KEEPALIVE 936 7.2. OPEN Message Format 938 After a transport protocol connection is established, the first 939 message sent by each side is an OPEN message. If the OPEN message is 940 acceptable, a KEEPALIVE message confirming the OPEN is sent back. 941 Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION 942 messages may be exchanged. 944 In addition to the fixed-size BGMP header, the OPEN message contains 945 the following fields: 947 0 1 2 3 948 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 949 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 950 | Version | Rsvd| AddrFam | Hold Time | 951 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 952 | BGMP Identifier (variable length) | 953 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 954 | | 955 + (Optional Parameters) | 956 | | 957 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 959 Draft BGMP January 2004 961 Version: 962 This 1-octet unsigned integer indicates the protocol version number 963 of the message. The current BGMP version number is 1. 965 AddrFam: 966 The IANA-assigned address family number of the BGMP Identifier. 967 These include (among others): 969 Number Description 970 ------ ----------- 971 1 IP (IP version 4) 972 2 IPv6 (IP version 6) 974 Hold Time: 975 This 2-octet unsigned integer indicates the number of seconds that 976 the sender proposes for the value of the Hold Timer. Upon receipt 977 of an OPEN message, a BGMP speaker MUST calculate the value of the 978 Hold Timer by using the smaller of its configured Hold Time and the 979 Hold Time received in the OPEN message. The Hold Time MUST be 980 either zero or at least three seconds. An implementation may 981 reject connections on the basis of the Hold Time. The calculated 982 value indicates the maximum number of seconds that may elapse 983 between the receipt of successive KEEPALIVE, and/or UPDATE messages 984 by the sender. 986 BGMP Identifier: 987 This 4-octet (for IPv4) or 16-octet (IPv6) unsigned integer 988 indicates the BGMP Identifier of the sender. A given BGMP speaker 989 sets the value of its BGMP Identifier to a globally-unique value 990 assigned to that BGMP speaker (e.g., an IPv4 address). The value 991 of the BGMP Identifier is determined on startup and is the same for 992 every BGMP session opened. 994 Optional Parameters: 995 This field may contain a list of optional parameters, where each 996 parameter is encoded as a triplet. The combined length of all optional 998 parameters can be derived from the Length field in the message 999 header. 1001 Draft BGMP January 2004 1003 0 1 1004 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1005 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 1006 | Parm. Type | Parm. Length | Parameter Value (variable) 1007 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 1009 Parameter Type is a one octet field that unambiguously identifies 1010 individual parameters. Parameter Length is a one octet field that 1011 contains the length of the Parameter Value field in octets. 1012 Parameter Value is a variable length field that is interpreted 1013 according to the value of the Parameter Type field. 1015 This document defines the following Optional Parameters: 1017 a) Authentication Information (Parameter Type 1): 1018 This optional parameter may be used to authenticate a BGMP peer. 1019 The Parameter Value field contains a 1-octet Authentication Code 1020 followed by a variable length Authentication Data. 1022 0 1 2 3 4 5 6 7 8 1023 +-+-+-+-+-+-+-+-+ 1024 | Auth. Code | 1025 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1026 | | 1027 | Authentication Data | 1028 | | 1029 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1031 Authentication Code: 1033 This 1-octet unsigned integer indicates the authentication 1034 mechanism being used. Whenever an authentication mechanism is 1035 specified for use within BGMP, three things must be included in 1036 the specification: 1038 - the value of the Authentication Code which indicates use of the 1039 mechanism, and - the form and meaning of the Authentication Data. 1041 Note that a separate authentication mechanism may be used in 1042 establishing the transport level connection. 1044 Authentication Data: 1046 The form and meaning of this field is a variable-length field 1048 Draft BGMP January 2004 1050 depend on the Authentication Code. 1052 The minimum length of the OPEN message is 12 octets (including 1053 message header). 1055 b) Capability Information (Parameter Type 2): 1056 This is an Optional Parameter that is used by a BGMP-speaker to 1057 convey to its peer the list of capabilities supported by the 1058 speaker. The parameter contains one or more triples , where each triple is 1060 encoded as shown below: 1061 +------------------------------+ 1062 | Capability Code (1 octet) | 1063 +------------------------------+ 1064 | Capability Length (1 octet) | 1065 +------------------------------+ 1066 | Capability Value (variable) | 1067 +------------------------------+ 1068 Capability Code: 1070 Capability Code is a one octet field that unambiguously identifies 1071 individual capabilities. 1073 Capability Length: 1075 Capability Length is a one octet field that contains the length of 1076 the Capability Value field in octets. 1078 Capability Value: 1080 Capability Value is a variable length field that is interpreted 1081 according to the value of the Capability Code field. 1083 A particular capability, as identified by its Capability Code, may 1084 occur more than once within the Optional Parameter. 1086 This document reserves Capability Codes 128-255 for vendor-specific 1087 applications. 1089 This document reserves value 0. 1091 Capability Codes (other than those reserved for vendor specific use) 1092 are assigned only by the IETF consensus process and IESG approval. 1094 Draft BGMP January 2004 1096 7.3. UPDATE Message Format 1098 UPDATE messages are used to transfer Join/Prune/FwdrPref information 1099 between BGMP peers. The UPDATE message always includes the fixed- 1100 size BGMP header, and one or more attributes as described below. 1102 The message format below allows compact encoding of (*,G) Joins and 1103 Prunes, while allowing the flexibility needed to do other updates 1104 such as (S,G) Joins and Prunes towards sources as well as on the 1105 shared tree. In the discussion below, an Encoded-Address-Prefix is 1106 of the form: 1107 0 1 2 3 1108 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1109 +-+-+-+-+-+-+-+-+ 1110 |EnTyp| AddrFam | 1111 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1112 | Address (variable length) | 1113 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1114 | Mask (variable length) | 1115 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1117 EnTyp: 1118 0 - All 1's Mask. The Mask field is 0 bytes long. 1119 1 - Mask length included. The Mask field is 4 bytes long, and 1120 contains the mask length, in bits. 1121 2 - Full Mask included. The Mask field is the same length 1122 as the Address field, and contains the full bitmask. 1124 AddrFam: 1125 The IANA-assigned address family number of the encoded prefix. 1127 Address: 1128 The address associated with the given prefix to be encoded. The 1129 length is determined based on the Address Family. 1131 Mask: 1132 The mask associated with the given prefix. The format (or absence) 1133 of this field is determined by the EnTyp field. 1135 Each attribute is of the form: 1137 0 1 2 3 1138 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1139 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1140 | Length | Type | Data ... 1142 Draft BGMP January 2004 1144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1145 All attributes are 4-byte aligned. 1147 Length: 1148 The Length is the length of the entire attribute, including the 1149 length, type, and data fields. If other attributes are nested 1150 within the data field, the length includes the size of all such 1151 nested attributes. 1153 Type: 1155 Types 128-255 are reserved for "optional" attributes. If a 1156 required attribute is unrecognized, a NOTIFICATION will be sent and 1157 the connection will be closed if the error is a fatal one. 1158 Unrecognized optional attributes are simply ignored. 1160 0 - JOIN 1161 1 - PRUNE 1162 2 - GROUP 1163 3 - SOURCE 1164 4 - FWDR_PREF 1165 5 - POISON_REVERSE 1167 a) JOIN (Type Code 0) 1169 The JOIN attribute indicates that all GROUP or SOURCE options 1170 nested immediately within the JOIN option should be joined. 1172 0 1 2 3 1173 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1174 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1175 | Length | Type=0 | Reserved | 1176 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1177 | Nested Attributes ... 1178 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1179 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1180 within a JOIN attribute. 1182 b) PRUNE (Type Code 1) 1184 The PRUNE attribute indicates that all GROUP or SOURCE attributes 1185 nested immediately within the PRUNE attribute should be pruned. 1187 Draft BGMP January 2004 1189 0 1 2 3 1190 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1191 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1192 | Length | Type=1 | Reserved | 1193 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1194 | Nested Attributes ... 1195 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1196 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1197 within a PRUNE attribute. 1199 c) GROUP (Type Code 2) 1201 The GROUP attribute identifies a given group-prefix. In addition, 1202 any attributes nested immediately within the GROUP attribute also 1203 apply to the given group-prefix. 1205 0 1 2 3 1206 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1207 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1208 | Length | Type=2 | | 1209 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1210 | | 1211 | Encoded-Address-Prefix | 1212 | | 1213 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1214 | Nested Attributes (optional) ... 1215 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1216 Encoded-Address-Prefix 1217 The multicast group prefix to be joined to 1218 pruned, 1219 in the format described above. 1220 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1221 be 1222 immediately nested within a GROUP attribute. 1224 d) SOURCE (Type Code 3): 1226 The SOURCE attribute identifies a given source-prefix. In 1227 addition, any attributes nested immediately within the SOURCE 1228 attribute also apply to the given source-prefix. 1230 The SOURCE attribute has the following format: 1232 0 1 2 3 1233 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1235 Draft BGMP January 2004 1237 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1238 | Length | Type=2 | | 1239 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1240 | | 1241 | Encoded-Address-Prefix | 1242 | | 1243 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1244 | Nested Attributes (optional) ... 1245 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1246 Encoded-Address-Prefix 1247 The Source-prefix in the format described 1248 above. 1249 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1250 be 1251 immediately nested within a SOURCE attribute. 1253 e) FWDR_PREF (Type Code 4) 1255 The FWDR_PREF attribute provides a forwarder preference value for 1256 all GROUP or SOURCE attributes nested immediately within the 1257 FWDR_PREF attribute. It is used by a BGMP speaker to inform other 1258 BGMP speakers of the originating speaker's degree of preference for 1259 a given group or source prefix. Usage of this attribute is 1260 described in 5.5. 1262 0 1 2 3 1263 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1264 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1265 | Length | Type=1 | Reserved | 1266 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1267 | Preference Value | 1268 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1269 | Nested Attributes ... 1270 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1271 Preference Value A 32-bit non-negative integer. 1272 Nested Attributes No JOIN, PRUNE, or FWDR_PREF attributes may be 1273 immediately nested within a FWDR_PREF 1274 attribute. 1276 e) POISON_REVERSE (Type Code 5) 1278 The POISON_REVERSE attribute provides a "poison-reverse" (PR-bit) 1279 value for all SOURCE attributes nested immediately within the 1280 POISON_REVERSE attribute. It is used by a BGMP speaker to inform 1281 other BGMP speakers from which it has received (S,G) Joins that 1283 Draft BGMP January 2004 1285 they are on the path of domains between the source and the root 1286 domain. 1288 0 1 2 3 1289 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1291 | Length | Type=1 | Reserved |P| 1292 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1293 | Nested Attributes ... 1294 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1295 P The PR-bit value. 1296 Nested Attributes No attribues in the document other than SOURCE 1297 may be immediately nested within a 1298 POISON_REVERSE 1299 attribute. 1301 7.4. Encoding examples 1303 Below are enumerated examples of how various updates are built using 1304 nested attributes, where A ( B ) denotes that attribute B is nested 1305 within attribute A. 1306 (*,G-prefix) Join: JOIN ( GROUP ) 1307 (*,G-prefix) Prune: PRUNE ( GROUP ) 1308 (S,G) Join towards S : GROUP ( JOIN ( SOURCE ) ) 1309 (S,G) Join cancelling prune towards root of G: GROUP ( JOIN ( SOURCE ) ) 1310 (S,G) Prune towards S: GROUP ( PRUNE ( SOURCE ) ) 1311 (S,G) Prune towards root of G: GROUP ( PRUNE ( SOURCE ) ) 1312 Switch from (*,G) to (S,G): PRUNE ( GROUP ( JOIN ( SOURCE ) ) ) 1313 Switch from (S,G) to (*,G): JOIN ( GROUP ) 1314 Initial (*,G) Join with S pruned: JOIN ( GROUP ( PRUNE ( SOURCE ) ) ) 1315 Forwarder preference announcement for G-prefix: FWDR_PREF ( GROUP ) 1316 Forwarder preference announcement for S-prefix: FWDR_PREF ( SOURCE ) 1318 7.5. KEEPALIVE Message Format 1320 BGMP does not use any transport protocol-based keep-alive mechanism 1321 to determine if peers are reachable. Instead, KEEPALIVE messages are 1322 exchanged between peers often enough as not to cause the Hold Timer 1323 to expire. A reasonable maximum time between the last KEEPALIVE or 1324 UPDATE message sent, and the time at which a KEEPALIVE message is 1325 sent, would be one third of the Hold Time interval. KEEPALIVE 1326 messages MUST NOT be sent more frequently than one per second. An 1327 implementation MAY adjust the rate at which it sends KEEPALIVE 1329 Draft BGMP January 2004 1331 messages as a function of the Hold Time interval. 1333 If the negotiated Hold Time interval is zero, then periodic KEEPALIVE 1334 messages MUST NOT be sent. 1336 A KEEPALIVE message consists of only a message header, and has a 1337 length of 4 octets. 1339 7.6. NOTIFICATION Message Format 1341 A NOTIFICATION message is sent when an error condition is detected. 1342 The BGMP connection is closed immediately after sending it if the 1343 error is a fatal one. 1345 In addition to the fixed-size BGMP header, the NOTIFICATION message 1346 contains the following fields: 1347 0 1 2 3 1348 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1349 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1350 |O| Error code | Error subcode | Data | 1351 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1352 | | 1353 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1355 O-bit: 1356 Open-bit. If clear, the connection will be closed. 1357 If set, indicates the error is not fatal. 1359 Error Code: 1361 This 1-octet unsigned integer indicates the type of 1362 NOTIFICATION. The following Error Codes have been defined: 1364 Error Code Symbolic Name Reference 1366 1 Message Header Error Section 9.1 1368 2 OPEN Message Error Section 9.2 1370 3 UPDATE Message Error Section 9.3 1372 4 Hold Timer Expired Section 9.5 1374 5 Finite State Machine Error Section 9.6 1376 Draft BGMP January 2004 1378 6 Cease Section 9.7 1380 Error subcode: 1382 This 1-octet unsigned integer provides more specific 1383 information about the nature of the reported error. Each 1384 Error 1385 Code may have one or more Error Subcodes associated with it. 1386 If no appropriate Error Subcode is defined, then a zero 1387 (Unspecific) value is used for the Error Subcode field. 1388 The notation (MC) below indicates the error is a fatal one 1389 and the O-bit must be clear. Non-fatal subcodes SHOULD 1390 be sent with the O-bit set. 1392 Message Header Error subcodes: 1394 2 - Bad Message Length (MC) 1395 3 - Bad Message Type (MC) 1397 OPEN Message Error subcodes: 1399 1 - Unsupported Version (MC) 1400 4 - Unsupported Optional Parameter 1401 5 - Authentication Failure (MC) 1402 6 - Unacceptable Hold Time (MC) 1403 7 - Unsupported Capability (MC) 1405 UPDATE Message Error subcodes: 1407 1 - Malformed Attribute List (MC) 1408 2 - Unrecognized Attribute Type 1409 5 - Attribute Length Error (MC) 1410 10 - Invalid Address 1411 11 - Invalid Mask 1412 13 - Unrecognized Address Family 1413 Data: 1414 This variable-length field is used to diagnose the reason for the 1415 NOTIFICATION. The contents of the Data field depend upon the 1416 Error Code and Error Subcode. See Section 9 below for more 1417 details. 1419 Note that the length of the Data field can be determined from the 1420 message Length field by the formula: 1422 Message Length = 6 + Data Length 1424 Draft BGMP January 2004 1426 The minimum length of the NOTIFICATION message is 6 octets 1427 (including message header). 1429 8. BGMP Error Handling 1431 This section describes actions to be taken when errors are detected 1432 while processing BGMP messages. BGMP Error Handling is similar to 1433 that of BGP [BGP]. 1435 When any of the conditions described here are detected, a 1436 NOTIFICATION message with the indicated Error Code, Error Subcode, 1437 and Data fields is sent, and the BGMP connection is closed if the 1438 error is a fatal one. If no Error Subcode is specified, then a zero 1439 must be used. 1441 The phrase "the BGMP connection is closed" means that the transport 1442 protocol connection has been closed and that all resources for that 1443 BGMP connection have been deallocated. The remote peer is removed 1444 from the target list of all tree state entries. 1446 Unless specified explicitly, the Data field of the NOTIFICATION 1447 message that is sent to indicate an error is empty. 1449 8.1. Message Header error handling 1451 All errors detected while processing the Message Header are indicated 1452 by sending the NOTIFICATION message with Error Code Message Header 1453 Error. The Error Subcode elaborates on the specific nature of the 1454 error. 1456 If the Length field of the message header is less than 4 or greater 1457 than 4096, or if the Length field of an OPEN message is less than 1458 the minimum length of the OPEN message, or if the Length field of an 1459 UPDATE message is less than the minimum length of the UPDATE message, 1460 or if the Length field of a KEEPALIVE message is not equal to 4, then 1461 the Error Subcode is set to Bad Message Length. The Data field 1462 contains the erroneous Length field. 1464 If the Type field of the message header is not recognized, then the 1465 Error Subcode is set to Bad Message Type. The Data field contains 1466 the erroneous Type field. 1468 Draft BGMP January 2004 1470 8.2. OPEN message error handling 1472 All errors detected while processing the OPEN message are indicated 1473 by sending the NOTIFICATION message with Error Code OPEN Message 1474 Error. The Error Subcode elaborates on the specific nature of the 1475 error. 1477 If the version number contained in the Version field of the received 1478 OPEN message is not supported, then the Error Subcode is set to 1479 Unsupported Version Number. The Data field is a 2-octet unsigned 1480 integer, which indicates the largest locally supported version number 1481 less than the version the remote BGMP peer bid (as indicated in the 1482 received OPEN message). 1484 If the Hold Time field of the OPEN message is unacceptable, then the 1485 Error Subcode MUST be set to Unacceptable Hold Time. An 1486 implementation MUST reject Hold Time values of one or two seconds. 1487 An implementation MAY reject any proposed Hold Time. An 1488 implementation which accepts a Hold Time MUST use the negotiated 1489 value for the Hold Time. 1491 If one of the Optional Parameters in the OPEN message is not 1492 recognized, then the Error Subcode is set to Unsupported Optional 1493 Parameters. 1495 If the OPEN message carries Authentication Information (as an 1496 Optional Parameter), then the corresponding authentication procedure 1497 is invoked. If the authentication procedure (based on Authentication 1498 Code and Authentication Data) fails, then the Error Subcode is set to 1499 Authentication Failure. 1501 If the OPEN message indicates that the peer does not support a 1502 capability which the receiver requires, the receiver may send a 1503 NOTIFICATION message to the peer, and terminate peering. The Error 1504 Subcode in the message is set to Unsupported Capability. The Data 1505 field in the NOTIFICATION message lists the set of capabilities that 1506 cause the speaker to send the message. Each such capability is 1507 encoded the same way as it was encoded in the received OPEN message. 1509 8.3. UPDATE message error handling 1511 All errors detected while processing the UPDATE message are indicated 1512 by sending the NOTIFICATION message with Error Code UPDATE Message 1514 Draft BGMP January 2004 1516 Error. The error subcode elaborates on the specific nature of the 1517 error. 1519 If any recognized attribute has Attribute Length that conflicts with 1520 the expected length (based on the attribute type code), then the 1521 Error Subcode is set to Attribute Length Error. The Data field 1522 contains the erroneous attribute (type, length and value). 1524 If the Encoded-Address-Prefix field in some attribute is 1525 syntactically incorrect, then the Error Subcode is set to Invalid 1526 Prefix Field. 1528 If any other is encountered when processing attributes (such as 1529 invalid nestings), then the Error Subcode is set to Malformed 1530 Attribute List, and the problematic attribute is included in the data 1531 field. 1533 8.4. NOTIFICATION message error handling 1535 If a peer sends a NOTIFICATION message, and there is an error in that 1536 message, there is unfortunately no means of reporting this error via 1537 a subsequent NOTIFICATION message. Any such error, such as an 1538 unrecognized Error Code or Error Subcode, should be noticed, logged 1539 locally, and brought to the attention of the administration of the 1540 peer. The means to do this, however, lies outside the scope of this 1541 document. 1543 8.5. Hold Timer Expired error handling 1545 If a system does not receive successive KEEPALIVE and/or UPDATE 1546 and/or NOTIFICATION messages within the period specified in the Hold 1547 Time field of the OPEN message, then the NOTIFICATION message with 1548 Hold Timer Expired Error Code must be sent and the BGMP connection 1549 closed. 1551 8.6. Finite State Machine error handling 1553 Any error detected by the BGMP Finite State Machine (e.g., receipt of 1554 an unexpected event) is indicated by sending the NOTIFICATION message 1555 with Error Code Finite State Machine Error. 1557 Draft BGMP January 2004 1559 8.7. Cease 1561 In absence of any fatal errors (that are indicated in this section), 1562 a BGMP peer may choose at any given time to close its BGMP connection 1563 by sending the NOTIFICATION message with Error Code Cease. However, 1564 the Cease NOTIFICATION message must not be used when a fatal error 1565 indicated by this section does exist. 1567 8.8. Connection collision detection 1569 If a pair of BGMP speakers try simultaneously to establish a TCP 1570 connection to each other, then two parallel connections between this 1571 pair of speakers might well be formed. We refer to this situation as 1572 connection collision. Clearly, one of these connections must be 1573 closed. 1575 Based on the value of the BGMP Identifier a convention is established 1576 for detecting which BGMP connection is to be preserved when a 1577 collision does occur. The convention is to compare the BGMP 1578 Identifiers of the peers involved in the collision and to retain only 1579 the connection initiated by the BGMP speaker with the higher-valued 1580 BGMP Identifier. 1582 Upon receipt of an OPEN message, the local system must examine all of 1583 its connections that are in the OpenConfirm state. A BGMP speaker 1584 may also examine connections in an OpenSent state if it knows the 1585 BGMP Identifier of the peer by means outside of the protocol. If 1586 among these connections there is a connection to a remote BGMP 1587 speaker whose BGMP Identifier equals the one in the OPEN message, 1588 then the local system performs the following collision resolution 1589 procedure: 1591 1. The BGMP Identifier of the local system is compared to the BGMP 1592 Identifier of the remote system (as specified in the OPEN message). 1594 2. If the value of the local BGMP Identifier is less than the remote 1595 one, the local system closes BGMP connection that already exists (the 1596 one that is already in the OpenConfirm state), and accepts BGMP 1597 connection initiated by the remote system. 1599 3. Otherwise, the local system closes newly created BGMP connection 1600 (the one associated with the newly received OPEN message), and 1601 continues to use the existing one (the one that is already in the 1602 OpenConfirm state). 1604 Draft BGMP January 2004 1606 Comparing BGMP Identifiers is done by treating them as (4-octet long) 1607 unsigned integers. 1609 A connection collision with an existing BGMP connection that is in 1610 Established states causes unconditional closing of the newly created 1611 connection. Note that a connection collision cannot be detected with 1612 connections that are in Idle, or Connect, or Active states. 1614 Closing the BGMP connection (that results from the collision 1615 resolution procedure) is accomplished by sending the NOTIFICATION 1616 message with the Error Code Cease. 1618 9. BGMP Version Negotiation 1620 BGMP speakers may negotiate the version of the protocol by making 1621 multiple attempts to open a BGMP connection, starting with the 1622 highest version number each supports. If an open attempt fails with 1623 an Error Code OPEN Message Error, and an Error Subcode Unsupported 1624 Version Number, then the BGMP speaker has available the version 1625 number it tried, the version number its peer tried, the version 1626 number passed by its peer in the NOTIFICATION message, and the 1627 version numbers that it supports. If the two peers do support one or 1628 more common versions, then this will allow them to rapidly determine 1629 the highest common version. In order to support BGMP version 1630 negotiation, future versions of BGMP must retain the format of the 1631 OPEN and NOTIFICATION messages. 1633 9.1. BGMP Capability Negotiation 1635 When a BGMP speaker sends an OPEN message to its BGMP peer, the 1636 message may include an Optional Parameter, called Capabilities. The 1637 parameter lists the capabilities supported by the speaker. 1639 A BGMP speaker may use a particular capability when peering with 1640 another speaker only if both speakers support that capability. A 1641 BGMP speaker determines the capabilities supported by its peer by 1642 examining the list of capabilities present in the Capabilities 1643 Optional Parameter carried by the OPEN message that the speaker 1644 receives from the peer. 1646 Draft BGMP January 2004 1648 10. BGMP Finite State machine 1650 This section specifies BGMP operation in terms of a Finite State 1651 Machine (FSM). Following is a brief summary and overview of BGMP 1652 operations by state as determined by this FSM. 1654 Initially BGMP is in the Idle state. 1656 Idle state: 1658 In this state BGMP refuses all incoming BGMP connections. No 1659 resources are allocated to the peer. In response to the Start 1660 event (initiated by either system or operator) the local system 1661 initializes all BGMP resources, starts the ConnectRetry timer, 1662 initiates a transport connection to the other BGMP peer, while 1663 listening for a connection that may be initiated by the remote 1664 BGMP peer, and changes its state to Connect. The exact value of 1665 the ConnectRetry timer is a local matter, but should be 1666 sufficiently large to allow TCP initialization. 1668 If a BGMP speaker detects an error, it shuts down the connection 1669 and changes its state to Idle. Getting out of the Idle state 1670 requires generation of the Start event. If such an event is 1671 generated automatically, then persistent BGMP errors may result in 1672 persistent flapping of the speaker. To avoid such a condition it 1673 is recommended that Start events should not be generated 1674 immediately for a peer that was previously transitioned to Idle 1675 due to an error. For a peer that was previously transitioned to 1676 Idle due to an error, the time between consecutive generation of 1677 Start events, if such events are generated automatically, shall 1678 exponentially increase. The value of the initial timer shall be 60 1679 seconds. The time shall be doubled for each consecutive retry. 1681 Any other event received in the Idle state is ignored. 1683 Connect state: 1685 In this state BGMP is waiting for the transport protocol 1686 connection to be completed. 1688 If the transport protocol connection succeeds, the local system 1689 clears the ConnectRetry timer, completes initialization, sends an 1690 OPEN message to its peer, and changes its state to OpenSent. If 1691 the transport protocol connect fails (e.g., retransmission 1692 timeout), the local system restarts the ConnectRetry timer, 1694 Draft BGMP January 2004 1696 continues to listen for a connection that may be initiated by the 1697 remote BGMP peer, and changes its state to Active state. 1699 In response to the ConnectRetry timer expired event, the local 1700 system restarts the ConnectRetry timer, initiates a transport 1701 connection to the other BGMP peer, continues to listen for a 1702 connection that may be initiated by the remote BGMP peer, and 1703 stays in the Connect state. 1705 The Start event is ignored in the Connect state. 1707 In response to any other event (initiated by either system or 1708 operator), the local system releases all BGMP resources associated 1709 with this connection and changes its state to Idle. 1711 Active state: 1713 In this state BGMP is trying to acquire a peer by listening for an 1714 incoming transport protocol connection. 1716 If the transport protocol connection succeeds, the local system 1717 clears the ConnectRetry timer, completes initialization, sends an 1718 OPEN message to its peer, sets its Hold Timer to a large value, 1719 and changes its state to OpenSent. A Hold Timer value of 4 1720 minutes is suggested. 1722 In response to the ConnectRetry timer expired event, the local 1723 system restarts the ConnectRetry timer, initiates a transport 1724 connection to other BGMP peer, continues to listen for a 1725 connection that may be initiated by the remote BGMP peer, and 1726 changes its state to Connect. 1728 If the local system detects that a remote peer is trying to 1729 establish BGMP connection to it, and the IP address of the remote 1730 peer is not an expected one, the local system restarts the 1731 ConnectRetry timer, rejects the attempted connection, continues to 1732 listen for a connection that may be initiated by the remote BGMP 1733 peer, and stays in the Active state. 1735 The Start event is ignored in the Active state. 1737 In response to any other event (initiated by either system or 1738 operator), the local system releases all BGMP resources associated 1739 with this connection and changes its state to Idle. 1741 Draft BGMP January 2004 1743 OpenSent state: 1745 In this state BGMP waits for an OPEN message from its peer. When 1746 an OPEN message is received, all fields are checked for 1747 correctness. If the BGMP message header checking or OPEN message 1748 checking detects an error (see Section 6.2), or a connection 1749 collision (see Section 6.8) the local system sends a NOTIFICATION 1750 message and changes its state to Idle. 1752 If there are no errors in the OPEN message, BGMP sends a KEEPALIVE 1753 message and sets a KeepAlive timer. The Hold Timer, which was 1754 originally set to a large value (see above), is replaced with the 1755 negotiated Hold Time value (see section 4.2). If the negotiated 1756 Hold Time value is zero, then the Hold Time timer and KeepAlive 1757 timers are not started. If the value of the Autonomous System 1758 field is the same as the local Autonomous System number, then the 1759 connection is an "internal" connection; otherwise, it is 1760 "external". Finally, the state is changed to OpenConfirm. 1762 If a disconnect notification is received from the underlying 1763 transport protocol, the local system closes the BGMP connection, 1764 restarts the ConnectRetry timer, while continue listening for 1765 connection that may be initiated by the remote BGMP peer, and goes 1766 into the Active state. 1768 If the Hold Timer expires, the local system sends NOTIFICATION 1769 message with error code Hold Timer Expired and changes its state 1770 to Idle. 1772 In response to the Stop event (initiated by either system or 1773 operator) the local system sends NOTIFICATION message with Error 1774 Code Cease and changes its state to Idle. 1776 The Start event is ignored in the OpenSent state. 1778 In response to any other event the local system sends NOTIFICATION 1779 message with Error Code Finite State Machine Error and changes its 1780 state to Idle. 1782 Whenever BGMP changes its state from OpenSent to Idle, it closes 1783 the BGMP (and transport-level) connection and releases all 1784 resources associated with that connection. 1786 OpenConfirm state: 1788 Draft BGMP January 2004 1790 In this state BGMP waits for a KEEPALIVE or NOTIFICATION message. 1792 If the local system receives a KEEPALIVE message, it changes its 1793 state to Established. 1795 If the Hold Timer expires before a KEEPALIVE message is received, 1796 the local system sends NOTIFICATION message with error code Hold 1797 Timer Expired and changes its state to Idle. 1799 If the local system receives a NOTIFICATION message, it changes 1800 its state to Idle. 1802 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1803 message and restarts its KeepAlive timer. 1805 If a disconnect notification is received from the underlying 1806 transport protocol, the local system changes its state to Idle. 1808 In response to the Stop event (initiated by either system or 1809 operator) the local system sends NOTIFICATION message with Error 1810 Code Cease and changes its state to Idle. 1812 The Start event is ignored in the OpenConfirm state. 1814 In response to any other event the local system sends NOTIFICATION 1815 message with Error Code Finite State Machine Error and changes its 1816 state to Idle. 1818 Whenever BGMP changes its state from OpenConfirm to Idle, it 1819 closes the BGMP (and transport-level) connection and releases all 1820 resources associated with that connection. 1822 Established state: 1824 In the Established state BGMP can exchange UPDATE, NOTIFICATION, 1825 and KEEPALIVE messages with its peer. 1827 If the local system receives an UPDATE or KEEPALIVE message, it 1828 restarts its Hold Timer, if the negotiated Hold Time value is non- 1829 zero. 1831 If the local system receives a NOTIFICATION message, it changes 1832 its state to Idle. 1834 If the local system receives an UPDATE message and the UPDATE 1836 Draft BGMP January 2004 1838 message error handling procedure (see Section 6.3) detects an 1839 error, the local system sends a NOTIFICATION message and changes 1840 its state to Idle. 1842 If a disconnect notification is received from the underlying 1843 transport protocol, the local system changes its state to Idle. 1845 If the Hold Timer expires, the local system sends a NOTIFICATION 1846 message with Error Code Hold Timer Expired and changes its state 1847 to Idle. 1849 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1850 message and restarts its KeepAlive timer. 1852 Each time the local system sends a KEEPALIVE or UPDATE message, it 1853 restarts its KeepAlive timer, unless the negotiated Hold Time 1854 value is zero. 1856 In response to the Stop event (initiated by either system or 1857 operator), the local system sends a NOTIFICATION message with 1858 Error Code Cease and changes its state to Idle. 1860 The Start event is ignored in the Established state. 1862 In response to any other event, the local system sends 1863 NOTIFICATION message with Error Code Finite State Machine Error 1864 and changes its state to Idle. 1866 Whenever BGMP changes its state from Established to Idle, it 1867 closes the BGMP (and transport-level) connection, releases all 1868 resources associated with that connection, and deletes all routes 1869 derived from that connection. 1871 11. Security Considerations 1873 If a BGMP speaker accepts unauthorized or altered BGMP messages, denial 1874 of service due to excess bandwidth consumption or lack of multicast 1875 connectivity can result. Authentication of BGMP messages can protect 1876 against this behavior. 1878 A BGMP implementation MUST implement Keyed MD5 [RFC2385] to secure 1879 control messages, and MUST be capable of interoperating with peers that 1880 do not support it. However, if one side of the connection is configured 1881 with Keyed MD5 and the other side is not, the connection SHOULD NOT be 1882 Draft BGMP January 2004 1884 established. 1886 This provides a weak security mechanism, as it is still possible for 1887 denial of service to occur as a result of messages relayed through a 1888 trusted peer. However, this model is the same as the currently 1889 practiced security mechanism for BGP. It is anticipated that future 1890 work will provide different stronger mechanisms for dealing with these 1891 issues in routing protocols. 1893 12. Authors' Address 1895 Dave Thaler 1896 Microsoft 1897 One Microsoft Way 1898 Redmond, WA 98052 1899 EMail: dthaler@microsoft.com 1901 13. Normative References 1903 [INTEROP] 1904 Thaler, D., "Interoperability Rules for Multicast Routing 1905 Protocols", RFC 2715, October 1999. 1907 [IPSEC] 1908 Kent, S. and R. Atkinson, "Security Architecture for the Internet 1909 Protocol", RFC 2401, November 1998. 1911 [RFC2119] 1912 S. Bradner, "Key words for use in RFCs to Indicate Requirement 1913 Levels", BCP 14, RFC 2119, March 1997. 1915 [V6PREFIX] 1916 Haberman, B. and D. Thaler, "Unicast-Prefix-based IPv6 Multicast 1917 Addresses", RFC 3306, August 2002. 1919 14. Informative References 1921 [BGP] 1922 Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1923 1771, March 1995. 1925 Draft BGMP January 2004 1927 [MBGP] 1928 Bates, T., Chandra, R., Katz, D. and Y. Rekhter, "Multiprotocol 1929 Extensions for BGP-4", RFC 2283, February 1998. 1931 [CBT] 1932 Ballardie, A., "Core Based Trees (CBT version 2) Multicast 1933 Routing", RFC 2189, September 1997. 1935 [DVMRP] 1936 Pusateri, T., "Distance Vector Multicast Routing Protocol", draft- 1937 ietf-idmr-dvmrp-v3-11.txt, Work in progress, October 2003. 1939 [IPv6AA] 1940 Hinden, R. and S. Deering, "IP Version 6 Addressing Architecture", 1941 RFC 2373, July 1998. 1943 [MOSPF] 1944 Moy, J., "Multicast Extensions to OSPF", RFC 1584, Proteon, March 1945 1994. 1947 [PIMDM] 1948 Adams, A., Nicholas, J. and W. Siadak, "Protocol Independent 1949 Multicast - Dense Mode (PIM-DM): Protocol Specification (Revised)", 1950 draft-ietf-pim-dm-new-v2-04.txt, Work in progress, September 2003. 1952 [PIMSM] 1953 Estrin, et al., "Protocol Independent Multicast-Sparse Mode (PIM- 1954 SM): Protocol Specification", RFC 2362, June 1998. 1956 [REFLECT] 1957 Bates, T. and R. Chandra, "BGP Route Reflection: An alternative to 1958 full mesh IBGP", RFC 1966, June 1996. 1960 [V4PREFIX] 1961 D. Thaler, "Unicast-Prefix-based IPv4 Multicast Addresses", draft- 1962 thaler-ipv4-uni-based-mcast-00.txt, Work in progress, November 1963 2001. 1965 15. Full Copyright Statement 1967 Copyright (C) The Internet Society (2004). All Rights Reserved. 1969 Draft BGMP January 2004 1971 This document and translations of it may be copied and furnished to 1972 others, and derivative works that comment on or otherwise explain it or 1973 assist in its implementation may be prepared, copied, published and 1974 distributed, in whole or in part, without restriction of any kind, 1975 provided that the above copyright notice and this paragraph are included 1976 on all such copies and derivative works. However, this document itself 1977 may not be modified in any way, such as by removing the copyright notice 1978 or references to the Internet Society or other Internet organizations, 1979 except as needed for the purpose of developing Internet standards in 1980 which case the procedures for copyrights defined in the Internet 1981 languages other than English. 1983 The limited permissions granted above are perpetual and will not be 1984 revoked by the Internet Society or its successors or assigns. 1986 This document and the information contained herein is provided on an "AS 1987 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 1988 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 1989 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 1990 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 1991 FITNESS FOR A PARTICULAR PURPOSE. 1993 Table of Contents 1995 1 Acknowledgements ................................................ 2 1996 2 Purpose ......................................................... 2 1997 3 Revision History ................................................ 3 1998 4 Terminology ..................................................... 4 1999 5 Protocol Overview ............................................... 6 2000 5.1 Design Rationale .............................................. 7 2001 6 Protocol Details ................................................ 9 2002 6.1 Interaction with the EGP ...................................... 9 2003 6.2 Multicast Data Packet Processing .............................. 10 2004 6.3 BGMP processing of Join and Prune messages and notifications 2005 .............................................................. 11 2006 6.3.1 Receiving Joins ............................................. 11 2007 6.3.2 Receiving Prune Notifications ............................... 12 2008 6.3.3 Receiving Route Change Notifications ........................ 13 2009 6.3.4 Receiving (S,G) Poison-Reverse messages ..................... 14 2010 6.4 Interaction with M-IGP components ............................. 14 2011 6.4.1 Interaction with DVMRP and PIM-DM ........................... 15 2012 6.4.2 Interaction with PIM-SM ..................................... 16 2013 6.4.3 Interaction with CBT ........................................ 17 2014 6.4.4 Interaction with MOSPF ...................................... 18 2015 Draft BGMP January 2004 2017 6.5 Operation over Multi-access Networks .......................... 19 2018 6.6 Interaction between (S,G) state and G-routes .................. 20 2019 7 Message Formats ................................................. 20 2020 7.1 Message Header Format ......................................... 20 2021 7.2 OPEN Message Format ........................................... 21 2022 7.3 UPDATE Message Format ......................................... 25 2023 7.4 Encoding examples ............................................. 29 2024 7.5 KEEPALIVE Message Format ...................................... 29 2025 7.6 NOTIFICATION Message Format ................................... 30 2026 8 BGMP Error Handling ............................................. 32 2027 8.1 Message Header error handling ................................. 32 2028 8.2 OPEN message error handling ................................... 33 2029 8.3 UPDATE message error handling ................................. 33 2030 8.4 NOTIFICATION message error handling ........................... 34 2031 8.5 Hold Timer Expired error handling ............................. 34 2032 8.6 Finite State Machine error handling ........................... 34 2033 8.7 Cease ......................................................... 35 2034 8.8 Connection collision detection ................................ 35 2035 9 BGMP Version Negotiation ........................................ 36 2036 9.1 BGMP Capability Negotiation ................................... 36 2037 10 BGMP Finite State machine ...................................... 37 2038 11 Security Considerations ........................................ 41 2039 12 Authors' Address ............................................... 42 2040 13 Normative References ........................................... 42 2041 14 Informative References ......................................... 42 2042 15 Full Copyright Statement ....................................... 43