idnits 2.17.1 draft-ietf-bgmp-spec-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 43 longer pages, the longest (page 2) being 113 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 44 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([MASC]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1403 has weird spacing: '...is less than...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BR32' is mentioned on line 231, but not defined == Missing Reference: 'BR41' is mentioned on line 231, but not defined == Missing Reference: 'BR31' is mentioned on line 233, but not defined == Missing Reference: 'BR42' is mentioned on line 233, but not defined == Missing Reference: 'BR43' is mentioned on line 233, but not defined == Missing Reference: 'BR22' is mentioned on line 235, but not defined == Missing Reference: 'BR52' is mentioned on line 235, but not defined == Missing Reference: 'BR53' is mentioned on line 235, but not defined == Missing Reference: 'BR21' is mentioned on line 237, but not defined == Missing Reference: 'BR51' is mentioned on line 237, but not defined == Missing Reference: 'BR12' is mentioned on line 239, but not defined == Missing Reference: 'BR61' is mentioned on line 239, but not defined == Missing Reference: 'BR13' is mentioned on line 241, but not defined == Missing Reference: 'BR71' is mentioned on line 245, but not defined == Missing Reference: 'BR81' is mentioned on line 245, but not defined == Missing Reference: 'BRXY' is mentioned on line 249, but not defined == Missing Reference: 'HPIM' is mentioned on line 269, but not defined == Missing Reference: 'PIM-SM' is mentioned on line 269, but not defined == Unused Reference: 'DVMRP' is defined on line 1864, but no explicit reference was found in the text == Unused Reference: 'MOSPF' is defined on line 1889, but no explicit reference was found in the text == Unused Reference: 'PIMDM' is defined on line 1893, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1771 (ref. 'BGP') (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 2283 (ref. 'MBGP') (Obsoleted by RFC 2858) -- Possible downref: Non-RFC (?) normative reference: ref. 'CBT' == Outdated reference: A later version (-02) exists of draft-ietf-idmr-cbt-br-spec-00 -- Possible downref: Normative reference to a draft: ref. 'CBTDM' == Outdated reference: A later version (-11) exists of draft-ietf-idmr-dvmrp-v3-05 ** Downref: Normative reference to an Informational draft: draft-ietf-idmr-dvmrp-v3 (ref. 'DVMRP') -- Possible downref: Non-RFC (?) normative reference: ref. 'DWR' == Outdated reference: A later version (-03) exists of draft-thaler-multicast-interop-01 ** Downref: Normative reference to an Informational draft: draft-thaler-multicast-interop (ref. 'INTEROP') ** Downref: Normative reference to an Informational draft: draft-ietf-ipngwg-multicast-assgn (ref. 'IPv6MAA') == Outdated reference: A later version (-03) exists of draft-ietf-mboned-imrp-some-issues-02 -- Possible downref: Normative reference to a draft: ref. 'ISSUES' -- Possible downref: Non-RFC (?) normative reference: ref. 'MASC' ** Downref: Normative reference to an Historic RFC: RFC 1584 (ref. 'MOSPF') == Outdated reference: A later version (-08) exists of draft-ietf-idmr-pim-dm-spec-05 -- Possible downref: Normative reference to a draft: ref. 'PIMDM' ** Obsolete normative reference: RFC 2117 (ref. 'PIMSM') (Obsoleted by RFC 2362) ** Obsolete normative reference: RFC 1966 (ref. 'REFLECT') (Obsoleted by RFC 4456) ** Obsolete normative reference: RFC 1700 (Obsoleted by RFC 3232) -- Duplicate reference: RFC1771, mentioned in 'RFC1771', was also mentioned in 'BGP'. ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) Summary: 15 errors (**), 0 flaws (~~), 32 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 BGMP Working Group D. Thaler 2 Internet Engineering Task Force Microsoft 3 INTERNET-DRAFT D. Estrin 4 January 10, 2000 USC/ISI 5 Expires July 2000 D. Meyer 6 Cisco 7 Editors 9 Border Gateway Multicast Protocol (BGMP): 10 Protocol Specification 11 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with all 16 provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering Task 19 Force (IETF), its areas, and its working groups. Note that other groups 20 may also distribute working documents as Internet-Drafts. 22 Internet Drafts are valid for a maximum of six months and may be 23 updated, replaced, or obsoleted by other documents at any time. It is 24 inappropriate to use Internet Drafts as reference material or to cite 25 them other than as a "work in progress". 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Abstract 35 This document describes BGMP, a protocol for inter-domain multicast 36 routing. BGMP builds shared trees for active multicast groups, and 37 allows receiver domains to build source-specific, inter-domain, 38 distribution branches where needed. Building upon concepts from CBT and 39 PIM-SM, BGMP requires that each multicast group be associated with a 40 single root (in BGMP it is referred to as the root domain). BGMP 41 assumes that at any point in time, different ranges of the class D space 43 Draft BGMP January 2000 45 are associated (e.g., with MASC [MASC]) with different domains. Each of 46 these domains then becomes the root of the shared domain-trees for all 47 groups in its range. Multicast participants will generally receive 48 better multicast service if the session initiator's address allocator 49 selects addresses from its own domain's part of the space, thereby 50 causing the root domain to be local to at least one of the session 51 participants. 53 Copyright Notice 55 Copyright (C) The Internet Society (2000). All Rights Reserved. 57 1. Acknowledgements 59 In addition to the editors, the following individuals have 60 contributed to the design of BGMP: Cengiz Alaettinoglu, Tony 61 Ballardie, Steve Casner, Steve Deering, Dino Farinacci, Bill Fenner, 62 Mark Handley, Ahmed Helmy, Van Jacobson, and Satish Kumar. 64 This document is the product of the IETF IDMR Working Group with Dave 65 Thaler, Deborah Estrin, and David Meyer as editors. 67 Rusty Eddy also provided valuable feedback on this document. 69 2. Purpose 71 It has been suggested that inter-domain multicast is better supported 72 with a rendezvous mechanism whereby members receive source's data 73 packets without any sort of global broadcast (e.g., DVMRP and PIM-DM 74 broadcast initial data packets and MOSPF broadcasts membership 75 information). CBT [CBT] and PIM-SM [PIMSM] use a shared group-tree, 76 to which all members join and thereby hear from all sources (and to 77 which non-members do not join and thereby hear from no sources). 79 This document describes BGMP, a protocol for inter-domain multicast 80 routing. BGMP builds shared trees for active multicast groups, and 81 allows domains to build source-specific, inter-domain, distribution 82 branches where needed. Building upon concepts from CBT and PIM-SM, 83 BGMP requires that each global multicast group be associated with a 84 single root. However, in BGMP, the root is an entire exchange or 85 domain, rather than a single router. 87 BGMP assumes that ranges of the class D space have been associated 89 Draft BGMP January 2000 91 (e.g., with MASC [MASC]) with selected domains. Each such domain 92 then becomes the root of the shared domain-trees for all groups in 93 its range. An address allocator will generally achieve better 94 distribution trees if it takes its multicast addresses from its own 95 domain's part of the space, thereby causing the root domain to be 96 local. 98 BGMP uses TCP as its transport protocol. This eliminates the need to 99 implement message fragmentation, retransmission, acknowledgement, and 100 sequencing. BGMP uses TCP port 264 for establishing its connections. 101 This port is distinct from BGP's port to provide protocol 102 independence, and to facilitate distinguishing between protocol 103 packets (e.g., by packet classifiers, diagnostic utilities, etc.) 105 Two BGMP peers form a TCP connection between one another, and 106 exchange messages to open and confirm the connection parameters. 107 They then send incremental Join/Prune Updates as group memberships 108 change. BGMP does not require periodic refresh of individual 109 entries. KeepAlive messages are sent periodically to ensure the 110 liveness of the connection. Notification messages are sent in 111 response to errors or special conditions. If a connection encounters 112 an error condition, a notification message is sent and the connection 113 is closed. 115 3. Terminology 117 This document uses the following technical terms: 119 Domain: 120 A set of one or more contiguous links and zero or more routers 121 surrounded by one or more multicast border routers. Note that this 122 loose definition of domain also applies to an external link between 123 two domains, as well as an exchange. 125 Root Domain: 126 When constructing a shared tree of domains for some group, one 127 domain will be the "root" of the tree. The root domain receives 128 data from each sender to the group, and functions as a rendezvous 129 domain toward which member domains can send inter-domain joins, and 130 to which sender domains can send data. 132 Draft BGMP January 2000 134 Multicast RIB: 135 The Routing Information Base, or routing table, used to calculate 136 the "next-hop" towards a particular address for multicast traffic. 138 Multicast IGP (M-IGP): 139 A generic term for any multicast routing protocol used for tree 140 construction within a domain. Typical examples of M-IGPs are: 141 DVMRP, PIM-DM, PIM-SM, CBT, and MOSPF. 143 EGP: A generic term for the interdomain unicast routing protocol in use. 144 Typically, this will be some version of BGP which can support a 145 Multicast RIB, such as MBGP [MBGP], containing both unicast and 146 multicast address prefixes. 148 Component: 149 The portion of a border router associated with (and logically 150 inside) a particular domain that runs the multicast IGP (M-IGP) for 151 that domain, if any. Each border router thus has zero or more 152 components inside routing domains. In addition, each border router 153 with external links that do not fall inside any routing domain will 154 have an inter-domain component that runs BGMP. 156 External peer: 157 A border router in another multicast AS (autonomous system, as used 158 in BGP), to which a BGMP TCP-connection is open. Assuming MBGP is 159 being used, a separate "eBGP" TCP-connection will also be open to 160 the same peer. 162 Internal peer: 163 Another border router of the same multicast AS. A border router 164 either speaks iBGP ("internal" BGP) directly to internal peers in a 165 full mesh, or indirectly through a route reflector [REFLECT]. A 166 border router is only required to establish a BGMP TCP-connection 167 to an internal peer when one border router acts as as a data 168 injector for another. 170 Next-hop peer: 171 The next-hop peer towards a given IP address is the next EGP router 172 on the path to the given address, according to multicast RIB routes 173 in the EGP's routing table (e.g., in MBGP, routes whose Subsequent 174 Address Family Identifier field indicates that the route is valid 175 for multicast traffic). 177 target: 178 Either an EGP peer, or an M-IGP component. 180 Draft BGMP January 2000 182 Tree State Table: 183 This is a table of (S-prefix,G-prefix) entries (including (*,G- 184 prefix) entries) that have been explicitly joined by a set of 185 targets. Each entry has, in addition to the source and group 186 addresses and masks, a list of targets that have explicitly 187 requested data (on behalf of directly connected hosts or downstream 188 routers). (S,G) entries also have an "SPT" bit. 190 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in 191 this document are to be interpreted as described in [RFC2119]. 193 4. Protocol Overview 195 BGMP maintains group-prefix state in response to messages from BGMP 196 peers and notifications from M-IGP components. Group-shared trees 197 are rooted at the domain advertising the group prefix covering those 198 groups. When a receiver joins a specific group address, the border 199 router towards the root domain generates a group-specific Join 200 message, which is then forwarded Border-Router-by-Border-Router 201 towards the root domain (see Figure 1). BGMP Join and Prune messages 202 are sent over TCP connections between BGMP peers, and BGMP protocol 203 state is refreshed by KEEPALIVE messages periodically sent over TCP. 205 BGMP routers build group-specific bidirectional forwarding state as 206 they process the BGMP Join messages. Bidirectional forwarding state 207 means that packets received from any target are forwarded to all 208 other targets in the target list without any RPF checks. No group- 209 specific state or traffic exists in parts of the network where there 210 are no members of that group. 212 BGMP routers build source-specific unidirectional forwarding state, 213 only where needed, to be compatible with source-specific trees (SPTs) 214 used by some M-IGPs (e.g., DVMRP, PIM-DM, or PIM-SM). A domain that 215 uses an SPT-based M-IGP may need to inject multicast packets from 216 external sources via different border routers (to be compatible with 217 the M-IGP RPF checks) which thus act as "surrogates". For example, 218 in the Transit_1 domain, data from Src_A arrives at BR12, but must be 219 injected by BR11. A surrogate router may create a source-specific 220 BGMP branch if no shared tree state exists. Note: stub domains with 221 a single border router, such as Rcvr_Stub_7 in Figure 1, receive all 222 multicast data packets through that router, to which all RPF checks 223 point. Therefore, stub domains never build source-specific state. 225 Root_Domain 227 Draft BGMP January 2000 229 [BR91]--------------------------\ 230 | | 231 [BR32] [BR41] 232 Transit_3 Transit_4 233 [BR31] [BR42] [BR43] 234 | | | 235 [BR22] [BR52] [BR53] 236 Transit_2 Transit_5 237 [BR21] [BR51] 238 | | 239 [BR12] [BR61] 240 Transit_1[BR11]----------[BR62]Stub_6 241 [BR13] (Src_A) 242 | (Rcvr_D) 243 ------------------- 244 | | 245 [BR71] [BR81] 246 Rcvr_Stub_7 Src_only_Stub_8 247 (Rcvr_C) (Src_B) 249 Figure 1: Example inter-domain topology. [BRXY] represents a BGMP 250 border 251 router. Transit_X is a transit domain network. *_Stub_X is a stub 252 domain network. 254 Data packets are forwarded based on a combination of BGMP and M-IGP 255 rules. The router forwards to a set of targets according to a 256 matching (S,G) BGMP tree state entry if it exists. If not found, the 257 router checks for a matching (*,G) BGMP tree state entry. If neither 258 is found, then the packet is sent natively to the next-hop EGP peer 259 for G, according to the Multicast RIB (for example, in the case of a 260 non-member sender such as Src_B in Figure 1). If a matching entry 261 was found, the packet is forwarded to all other targets in the target 262 list. In this way BGMP trees forward data in a bidirectional manner. 263 If a target is an M-IGP component then forwarding is subject to the 264 rules of that M-IGP protocol. 266 4.1. Design Rationale 268 Several other protocols, or protocol proposals, build shared trees 269 within domains [CBT, HPIM, PIM-SM]. The design choices made for BGMP 270 result from our focus on Inter-Domain multicast in particular. The 271 design choices made by CBT and PIM-SM are better suited to the wide- 273 Draft BGMP January 2000 275 area intra-domain case. There are three major differences between 276 BGMP and other shared-tree protocols: 278 (1) Unidirectional vs. Bidirectional trees 280 Bidirectional trees (using bidirectional forwarding state as 281 described above) minimize third party dependence which is essential 282 in the inter-domain context. For example, in Figure 1, stub domains 283 7 and 8 would like to exchange multicast packets without being 284 dependent on the quality of connectivity of the root domain. 285 However, unidirectional shared trees (i.e., those using RPF checks) 286 have more aggressive loop prevention and share the same processing 287 rules as source-specific entries which are inherently unidirectional. 289 The lack of third party dependence concerns in the INTRA domain case 290 reduces the incentive to employ bidirectional trees. BGMP supports 291 bidirectional trees because it has to, and because it can without 292 excessive cost. 294 (2) Source-specific distribution trees/branches 296 In a departure from other shared tree protocols, source-specific BGMP 297 state is built ONLY where (a) it is needed to pull the multicast 298 traffic down to a BGMP router that has source-specific (S,G) state, 299 and (b) that router is NOT already on the shared tree (i.e., has no 300 (*,G) state), and (c) that router does not want to receive packets 301 via encapsulation from from a router which is on the shared tree. 302 BGMP provides source-specific branches because most M-IGP protocols 303 in use today build source-specific trees. BGMP's source-specific 304 branches eliminate the unnecessary overhead of encapsulations for 305 high data rate sources from the shared tree's ingress router to the 306 surrogate injector (e.g. from BR12 to BR11 in Figure 1). Moreover, 307 cases in which shared paths are significantly longer than SPT paths 308 will also benefit. 310 However, we do not build source-specific inter-domain trees in 311 general because (a) inter-domain connectivity is generally less rich 312 than intra-domain connectivity, so shared distribution trees should 313 have more acceptible path length and traffic concentration properties 314 in the inter-domain context, than in the intra-domain case, and (b) 315 by having the shared tree state always take precedence over source- 316 specific tree state, we avoid ambiguities that can otherwise arise. 318 In summary, BGMP trees are, in a sense, a hybrid between CBT and PIM- 319 SM trees. 321 Draft BGMP January 2000 323 (3) Method of choosing root of group shared tree 325 The choice of a group's shared-tree-root has implications for 326 performance and policy. In the intra-domain case it can be assumed 327 that all potential shared-tree roots (RPs/Cores) within the domain 328 are equally suited to be the root for a group that is initiated 329 within that domain. In the INTER-domain case, there is far more 330 opportunity for unacceptably poor locality, and administrative 331 control of a group's shared-tree root. Therefore in the intra-domain 332 case, other protocols treat all candidate roots (RPs or Cores) as 333 equivalent and emphasize load sharing and stability to maximize 334 performance. In the Inter-Domain case, all roots are not equivalent, 335 and we adopt an approach whereby a group's root domain is not random 336 but is subject to administrative and performance input. 338 5. Protocol Details 340 In this section, we describe the detailed protocol that border 341 routers perform. We assume that each border router conforms to the 342 component-based model described in [INTEROP]. 344 5.1. Interaction with the EGP 346 A fundamental requirement imposed by BGMP on the design of an EGP is 347 that it be able to carry multicast prefixes. For example, a multi- 348 protocol BGP (MBGP) must be able to carry a multicast prefix in the 349 Unicast Network Layer Reachability Information (NLRI) field of the 350 UPDATE message (i.e., either an IPv4 class D prefix or an IPv6 prefix 351 with high-order octet equal to FF [IPv6MAA]). This capability is 352 required by BGMP in the implementation of bi-directional trees; BGMP 353 must be able to forward data and control packets to the next hop 354 towards either a unicast source S or a multicast group G (see section 355 5.2). It is also required that the path attributes defined in 356 [RFC1771] have the same semantics whether they are accompany unicast 357 or multicast NLRI. 359 MBGP [MBGP] satisfies the requirement described above. [MBGP] defines 360 the optional transitive attributes Multiprotocol Reachable NLRI 361 (MP_REACH_NLRI) and Multiprotocol Unreachable (MP_UNREACH_NRLI) to 362 carry sets of reachable or unreachable destinations, and the 363 appropriate next hop in the case of MP_REACH_NLRI. These attributes 364 contain an Address Family Information field [RFC1700] which indicates 365 the type of NLRI carried in the attribute. In addition, the attribute 367 Draft BGMP January 2000 369 carries another field, the Subsequent Address Family Identifier, or 370 SAFI, which can be used to provide additional information about the 371 type of NLRI. For example, SAFI value two indicates that the NLRI is 372 valid for multicast forwarding. BGMP's requirement can be satisfied 373 by allowing the NLRI field of the MP_REACH_NLRI (or MP_UNREACH_NLRI) 374 to carry a multicast prefix in the Prefix field of the NLRI encoding. 376 Finally, while not required for correct BGMP operation, the design of 377 an EGP should also provide a mechanism that allows discrimination 378 between NLRI that is to be used for unicast forwarding and NLRI to be 379 used for multicast forwarding. This property is required to support 380 multicast-specific policy. As mentioned above, MBGP [MBGP] has this 381 capability. 383 5.2. Multicast Data Packet Processing 385 For BGMP rules to be applied, an incoming packet must first be 386 "accepted": 388 o If the packet arrived on an interface owned by an M-IGP, the M-IGP 389 component determines whether the packet should be accepted or 390 dropped according to its rules. If the packet is accepted, the 391 packet is forwarded (or not forwarded) out any other interfaces 392 owned by the same component, as specified by the M-IGP. 394 o If the packet was received over a point-to-point interface owned 395 by BGMP, the packet is accepted. 397 o If the packet arrived on a multiaccess network interface owned by 398 BGMP, the packet is accepted if it is the designated forwarder for 399 longest matching route for S, if it is receiving data on a source- 400 specific branch, or for the longest matching route for G. 402 If the packet is accepted, then the router checks the tree state 403 table for a matching (S,G) entry. If one is found, but the packet 404 was not received from the next hop target towards S (if the entry's 405 SPT bit is True), or was not received from the next hop target 406 towards G (if the entry's SPT bit is False) then the packet is 407 dropped and no further actions are taken. If no (S,G) entry was 408 found, the router then checks for a matching (*,G) entry. 410 If neither is found, then the packet is forwarded towards the next- 411 hop peer for G, according to the Multicast RIB. If a matching entry 412 was found, the packet is forwarded to all other targets in the target 414 Draft BGMP January 2000 416 list. 418 Forwarding to a target which is an M-IGP component means that the 419 packet is forwarded out any interfaces owned by that component 420 according to that component's multicast forwarding rules. 422 5.3. BGMP processing of Join and Prune messages and notifications 424 5.3.1. Receiving Joins 426 When the BGMP component receives a (*,G) or (S,G) Join alert from 427 another component, or a BGMP (S,G) or (*,G) Join message from an 428 external peer, it searches the tree state table for a matching entry. 429 If an entry is found, and that peer is already listed in the target 430 list, then no further actions are taken. 432 Otherwise, if no (*,G) or (S,G) entry was found, one is created. In 433 the case of a (*,G), the target list is initialized to contain the 434 next-hop peer towards G, if it is an external peer. If the peer is 435 internal, the target list is initialized to contain the M-IGP 436 component owning the next-hop interface. If there is no next-hop 437 peer (because G is inside the domain), then the target list is 438 initialized to contain the next-hop component. If an (S,G) entry 439 exists for the same G for which the (*,G) Join is being processed, 440 and the next-hop peers toward S and G are different, the BGMP router 441 must first send a (S,G) Prune message toward the source and clear the 442 SPT bit on the (S,G) entry, before activating the (*,G) entry. 444 The target from which the Join was received is then added to the 445 target list. The router then looks up S or G in the Multicast RIB to 446 find the next-hop EGP peer. If the target list, not including the 447 next-hop target towards G for a (*,G) entry, becomes non-null as a 448 result, the next-hop EGP peer must be notified as follows: 450 a) If the next-hop peer towards G (for a (*,G) entry) is an external 451 peer, a BGMP (*,G) Join message is unicast to the external peer. 452 If the next-hop peer towards S (for an (S,G) entry) is an external 453 peer, and the router does NOT have any active (*,G) state for that 454 group address G, a BGMP (S,G) Join message is unicast to the 455 external peer. A BGMP (S,G) Join message is never sent to an 456 external peer by a router that also contains active (*,G) state 457 for the same group. If the next-hop peer towards S (for an (S,G 458 entry) is an external peer and the router DOES have active (*,G) 459 state for that group G, the SPT bit is always set to False. 461 Draft BGMP January 2000 463 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Join 464 alert is sent to the M-IGP component owning the next-hop 465 interface. 467 c) If there is no next-hop peer, a (*,G) or (S,G) Join alert is sent 468 to the M-IGP component owning the next-hop interface. 470 5.3.2. Receiving Prune Notifications 472 When the BGMP component receives a (*,G) or (S,G) Prune alert from 473 another component, or a BGMP (*,G) or (S,G) Prune message from an 474 external peer, it searches the tree state table for a matching entry. 475 If no (S,G) entry was found for an (S,G) Prune, but (*,G) state 476 exists, an (S,G) entry is created, with the target list copied from 477 the (*,G) entry. If no matching entry exists, or if the component or 478 peer is not listed in the target list, no further actions are taken. 480 Otherwise, the component or peer is removed from the target list. If 481 the target list becomes null as a result, the next-hop peer towards G 482 (for a (*,G) entry), or towards S (for an (S,G) entry if and only if 483 the BGMP router does NOT have any corresponding (*,G) entry), must be 484 notified as follows. 486 a) If the peer is an external peer, a BGMP (*,G) or (S,G) Prune 487 message is unicast to it. 489 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Prune 490 alert is sent to the M-IGP component owning the next-hop 491 interface. 493 c) If there is no next-hop peer, a (*,G) or (S,G) Prune alert is sent 494 to the M-IGP component owning the next-hop interface. 496 5.3.3. Receiving Route Change Notifications 498 When a border router receives a route for a new prefix in the 499 multicast RIB, or a existing route for a prefix is withdrawn, a route 500 change notification for that prefix must be sent to the BGMP 501 component. In addition, when the next hop peer (according to the 502 multicast RIB) changes, a route change notification for that prefix 503 must be sent to the BGMP component. 505 Draft BGMP January 2000 507 In addition, an internal route for each class-D prefix associated 508 with the domain (if any) MUST be injected into the multicast RIB in 509 the EGP by the domain's border routers. 511 When a route for a new group prefix is learned, or an existing route 512 for a group prefix is withdrawn, or the next-hop peer for a group 513 prefix changes, a BGMP router updates all affected (*,G) target 514 lists. The router sends a (*,G) Join to the new next-hop target, and 515 a (*,G) Prune to the old next-hop target, as appropriate. 517 When an existing route for a source prefix is withdrawn, or the next- 518 hop peer for a source prefix changes, a BGMP router updates all 519 affected (S,G) target lists. The router sends a (S,G) Join to the 520 new next-hop target, and a (S,G) Prune to the old next-hop target, as 521 appropriate. 523 5.4. Interaction with M-IGP components 525 When an M-IGP component on a border router first learns that there 526 are internally-reached members for a group G (whose scope is larger 527 than that domain), a (*,G) Join alert is sent to the BGMP component. 528 Similarly, when an M-IGP component on a border router learns that 529 there are no longer internally-reached members for a group G (whose 530 scope is larger than a single domain), a (*,G) Prune alert is sent to 531 the BGMP component. 533 At any time, any M-IGP domain MAY decide to join a source-specific 534 branch for some external source S and group G. When the M-IGP 535 component in the border router that is the next-hop router for a 536 particular source S learns that a receiver wishes to receive data 537 from S on a source-specific path, an (S,G) Join alert is sent to the 538 BGMP component. When it is learned that such receivers no longer 539 exist, an (S,G) Prune alert is sent to the BGMP component. Recall 540 that the BGMP component will generate external source-specific Joins 541 only where the source-specific branch does not coincide with the 542 shared tree distribution tree for that group. 544 Finally, we will require that the border router that is the next-hop 545 internal peer for a particular address S or G be able to forward data 546 for a matching tree state table entry to all members within the 547 domain. This requirement has implications on specific M-IGPs as 548 follows. 550 Draft BGMP January 2000 552 5.4.1. Interaction with DVMRP and PIM-DM 554 DVMRP and PIM-DM are both "broadcast and prune" protocols in which 555 every data packet must pass an RPF check against the packet's source 556 address, or be dropped. If the border router receiving packets from 557 an external source is the only BR to inject the route for the source 558 into the domain, then there are no problems. For example, this will 559 always be true for stub domains with a single border router (see 560 Figure 1). Otherwise, the border router receiving packets externally 561 is responsible for encapsulating the data to any other border routers 562 that must inject the data into the domain for RPF checks to succeed. 563 Although peering sessions to internal peers are normally not 564 required, in this situation, BGMP TCP-connections must exist between 565 such internal peers, and the "virtual" interfaces used for 566 encapsulation are owned by BGMP. 568 When an intended border router injector for a source receives 569 encapsulated packets from another border router in its domain, it 570 should create source-specific (S,G) BGMP state. Note that the border 571 router may be configured to do this on a data-rate triggered basis so 572 that the state is not created for very low data-rate/intermittent 573 sources. If source-specific state is created, then its incoming 574 interface points to the virtual encapsulation interface from the 575 border router that forwarded the packet, and it has an SPT flag that 576 is initialized to be False. 578 When the (S,G) BGMP state is created, the BGMP component will in turn 579 send a BGMP (S,G) Join message to the next-hop external peer towards 580 S if there is no (*,G) state for that same group, G. The (S,G) BGMP 581 state will have the SPT bit set to False if (*,G) BGMP state is 582 present. 584 When the first data packet from S arrives from the external peer and 585 matches on the BGMP (S,G) state, and IF there is no (*,G) state, the 586 router sets the SPT flag to True, resets the incoming interface to 587 point to the external peer, and sends a BGMP (S,G) Prune message to 588 the border router that was encapsulating the packets (e.g., in Figure 589 1, BR11 sends the (Src_A,G) Prune to BR12). When the border router 590 with (*,G) state receives the prune for (S,G), it then deletes that 591 border router from its list of targets. 593 PIM-DM and DVMRP present an additional problem, i.e., no protocol 594 mechanism exists for joining and pruning entire groups; only joins 595 and prunes for individual sources are available. We therefore 596 require that some form of Domain-Wide Reports (DWRs) [DWR] are 598 Draft BGMP January 2000 600 available within such domains. Such messages provide the ability to 601 join and prune an entire group across the domain. One simple 602 heuristic to approximate DWRs is to assume that if there are any 603 internally-reached members, then at least one of them is a sender. 604 With this heuristic, the presense of any M-IGP (S,G) state for 605 internally-reached sources can be used instead. Sending a data 606 packet to a group is then equivalent to sending a DWR for the group. 608 5.4.2. Interaction with PIM-SM 610 Protocols such as PIM-SM build unidirectional shared and source- 611 specific trees. As with DVMRP and PIM-DM, every data packet must 612 pass an RPF check against some group-specific or source-specific 613 address. 615 The fewest encapsulations/decapsulations will be done when the intra- 616 domain tree is rooted at the next-hop internal peer towards G (which 617 becomes the RP), since in general that router will receive the most 618 packets from external sources. To achieve this, each BGMP border 619 router to a PIM-SM domain should send Candidate-RP-Advertisements 620 within the domain for those groups for which it is the shared-domain 621 tree ingress router. When the border router that is the RP for a 622 group G receives an external data packet, it forwards the packet 623 according to the M-IGP (i.e., PIM-SM) shared-tree outgoing interface 624 list. 626 Other border routers will receive data packets from external sources 627 that are farther down the bidirectional tree of domains. When a 628 border router that is not the RP receives an external packet for 629 which it does not have a source-specific entry, the border router 630 treats it like a local source by creating (S,G) state with a Register 631 flag set, based on normal PIM-SM rules; the Border router then 632 encapsulates the data packets in PIM-SM Registers and unicasts them 633 to the RP for the group. As explained above, the RP for the inter- 634 domain group will be one of the other border routers of the domain. 636 If a source's data rate is high enough, DRs within the PIM-SM domain 637 may switch to the shortest path tree. If the shortest path to an 638 external source is via the group's ingress router for the shared 639 tree, the new (S,G) state in the BGMP border router will not cause 640 BGMP (S,G) Joins because that border router will already have (*,G) 641 state. If however, the shortest path to an external source is via 642 some other border router, that border router will create (S,G) BGMP 644 Draft BGMP January 2000 646 state in response to the M-IGP (S,G) Join alert. In this case, 647 because there is no local (*,G) state to supress it, the border 648 router will send a BGMP (S,G) Join to the next-hop external peer 649 towards S, in order to pull the data down directly. (See BR11 in 650 Figure 1.) As in normal PIM-SM operation, those PIM-SM routers that 651 have (*,G) and (S,G) state pointing to different incoming interfaces 652 will prune that source off the shared tree. Therefore, all internal 653 interfaces may be eventually pruned off the internal shared tree. 655 5.4.3. Interaction with CBT 657 CBT builds bidirectional shared trees but must address two points of 658 compatibility with BGMP. First, CBT can not accommodate more than 659 one border router injecting a packet. Therefore, if a CBT domain 660 does have multiple external connections, the M-IGP components of the 661 border routers are responsible for insuring that only one of them 662 will inject data from any given source. This mechanism is provided 663 in [CBTDM]. 665 Second, CBT cannot process source-specific Joins or Prunes. Two 666 options thus exist for each CBT domain: 668 Option A: 669 The CBT component interprets a (S,G) Join alert as if it were an 670 (*,G) Join alert, as described in [INTEROP]. That is, if it is not 671 already on the core-tree for G, then it sends a CBT (*,G) JOIN- 672 REQUEST message towards the core for G. Similarly, when the CBT 673 component receives an (S,G) Prune alert, and the child interface 674 list for a group is NULL, then it sends a (*,G) QUIT_NOTIFICATION 675 towards the core for G. This option has the disadvantage of 676 pulling all data for the group G down to the CBT domain when no 677 members exist. 679 Option B: 680 The CBT domain does not propagate any source routes (i.e., non- 681 class D routes) to their external peers for the Multicast RIB 682 unless it is known that no other path exists to that prefix (e.g., 683 routes for prefixes internal to the domain or in a singly-homed 684 customer's domain may be propagated). This insures that source- 685 specific joins are never received unless the source's data already 686 passes through the domain on the shared tree, in which case the 687 (S,G) Join need not be propagated anyway. BGMP border routers will 688 only send source-specific Joins or Prunes to an external peer if 689 that external peer advertises source-prefixes in the EGP. If a 691 Draft BGMP January 2000 693 BGMP-CBT border router does receive an (S,G) Join or Prune, that 694 border router should ignore the message. 696 To minimize en/de-capsulations, CBTv2 BR's may follow the same 697 scheme as described under PIM-SM above, in which Candidate-Core 698 advertisements are sent for those groups for which it is the 699 shared-tree ingress router. 701 5.4.4. Interaction with MOSPF 703 As with CBTv2, MOSPF cannot process source-specific Joins or Prunes, 704 and the same two options are available. Therefore, an MOSPF domain 705 may either: 707 Option A: 708 send a Group-Membership-LSA for all of G in response to a (S,G) 709 Join alert, and "prematurely age" it out (when no other downstream 710 members exist) in response to an (S,G) Prune alert, OR 712 Option B: 713 not propagate any source routes (i.e., non-class D routes) to their 714 external peers for the Multicast RIB unless it is known that no 715 other path exists to that prefix (e.g., routes for prefixes 716 internal to the domain or in a singly-homed customer's domain may 717 be propagated) 719 5.5. Operation over Multi-access Networks 721 Multiaccess links require special handling to prevent duplicates. 722 The following mechanism enables BGMP to operate over multiaccess 723 links which do not run an M-IGP. This avoids broadcast-and-prune 724 behavior and does not require (S,G) state. 726 To elect a designated forwarder per prefix, BGMP uses a FWDR_PREF 727 message to exchange "forwarder preference" values for each prefix. 728 The peer with the highest forwarder preference becomes the designated 729 forwarder, with ties broken by lowest BGMP Identifier. The 730 designated forwarder is the router responsible for forwarding packets 731 up the tree, and is the peer to which joins will be sent. 733 When BGMP first learns that a route exists in the multicast RIB whose 734 next-hop interface is NOT the multiaccess link, the BGMP router sends 735 a BGMP FWDR_PREF message for the prefix, to all BGMP peers on the 737 Draft BGMP January 2000 739 LAN. The FWDR_PREF message contains a "forwarder preference value" 740 for the local router, and the same value MUST be sent to all peers on 741 the LAN. Likewise, when the prefix is no longer reachable, a 742 FWDR_PREF of 0 is sent to all peers on the LAN. 744 Whenever a BGMP router calculates the next-hop peer towards a 745 particular address, and that peer is reached over a BGMP-owned 746 multiaccess LAN, the designated forwarder is used instead. 748 When a BGMP router receives a FWDR_PREF message from a peer, it looks 749 up the matching route in its multicast RIB, and calculates the new 750 designated forwarder. If the router has tree state entries whose 751 parent target was the old forwarder, it sends Joins to the new 752 forwarder and Prunes to the old forwarder. 754 When a BGMP router which is NOT the designated forwarder receives a 755 packet on the multiaccess link, it is silently dropped. 757 Finally, this mechanism prevents duplicates where full peering exists 758 on a "logical" link. Where full peering does not exist, steps must 759 be taken (outside of BGMP) to present separate logical interfaces to 760 BGMP, each of which is a link with full peering. This might entail, 761 for example, using different link-layer address mappings, doing 762 encapsulation, or changing the physical media. 764 6. Interaction with address allocation 766 6.1. Requirements for BGMP components 768 Each border router must be able to determine (e.g., from MASC [MASC]) 769 which class-D prefixes (if any) belong to each domain in which an M- 770 IGP component resides, so that it can inject routes for them into the 771 routing table. 773 7. Transition Strategy 775 There have been significant barriers to multicast deployment in 776 Internet backbones. While many of the problems with the current 777 DVMRP backbone (MBONE) have been documented in [ISSUES], most of 778 these problems require longer term engineering solutions. However, 779 there is much that can be done with existing technologies to enable 780 deployment and put in place an architecture that will enable a smooth 782 Draft BGMP January 2000 784 transition to the next generation of inter-domain multicast routing 785 protocols (i.e., BGMP). This section proposes a near-term transition 786 strategy and architecture that is designed to be simple, risk- 787 neutral, and provide a smooth, incremental transition path to BGMP. 788 In addition, the transition architecture provides for improved 789 convergence properties, some initial policy control, and the 790 opportunity for providers to run either native or tunneled multicast 791 backbones and exchanges. 793 The transition strategy proposed here is to initially use MBGP [MBGP] 794 to provide the desired convergence and policy control properties, and 795 PIM-DM for multicast data forwarding. Once this architecture is in 796 place, backbones and exchanges can incrementally transition to BGMP 797 and domains running other M-IGPs may be incorporated more fully. 799 Since the current MBone uses a broadcast-and-prune backbone running 800 DVMRP, BGMP may view the entire MBone as a single multi-homed stub 801 domain (with a new AS number). The members-are-senders heuristic can 802 then be used initially to provide membership notifications within 803 this stub domain. 805 A BGMP backbone can then be formed by designating one or more neutral 806 PIM-DM domains (say, exchanges) as initial BGMP backbones. Each 807 exchange is then associated with a group prefix which is injected 808 into the Multicast RIB by all MBGP/BGMP border routers on that 809 exchange. 811 Any domain which meets the following constraints may then transition 812 from a normal MBone-connected domain to one running BGMP: 814 (1) Must peer with another BGMP domain and participate in M-BGP to 815 propagate routes in the Multicast RIB. 817 (2) Must establish an internal (to the MBone AS) EGP (e.g., iBGP) 818 peer relationship with other border routers of the MBone "stub" 819 domain, as is done with unicast routing. We expect this to 820 eventually involve the use of one or more route reflectors 821 [REFLECT] inside the MBone domain. 823 (3) If the transition will partition the MBone "stub" domain, then it 824 must be insured that the MBone domain will be administratively 825 split into multiple domains, each with a different multicast AS 826 number. 828 Draft BGMP January 2000 830 7.1. Preventing transit through the MBone stub 832 We desire that two AS's which are mutually reachable through BGMP use 833 paths which do not pass through the MBone stub domain. This is 834 illustrated in Figure 2, where the MBone stub is AS 5, which is 835 multi-homed to both AS 3 and AS 4. Paths between sources and 836 destinations which have already transitioned to MBGP/BGMP should not 837 use AS 5 as transit unless no other path exists. 839 ----------------------\ /---------------------------- 840 | | 841 DVMRP /----\ | | /----\ IGP/iBGP 842 ..............| BR |+++++++++| BR |----------- 843 \----/ | E | \----/ 844 + | B | + AS 3 845 MBone + | G | + 846 + | P \-----+---------------------- 847 AS 5 iBGP + | + eBGP 848 + | /-----+---------------------- 849 + | | + 850 + | | + 851 DVMRP /----\ | | /----\ IGP/iBGP 852 ..............| BR |+++++++++| BR |----------- 853 \----/ | | \----/ 854 | | AS 4 855 | | 856 ----------------------/ \---------------------------- 858 Figure 2: Preventing Transit through MBone Stub 860 This requirement is easily solved using standard BGP policy 861 mechanisms. The MBone border routers should prefer EGP routes to 862 DVMRP routes, since DVMRP cannot tag routes as being external. Thus, 863 external routes may appear in the DVMRP routing table, but will not 864 be imported into the EGP since they will be overridden by iBGP 865 routes. 867 Other EGP routers should prefer routes whose ASpath does not contain 868 the well-known MBone AS number. This will insure that the route 869 through the MBone stub is not used unless no other path exists. For 870 safety, routes whose ASpath begins with the MBone AS should receive 871 the worst preference. 873 Draft BGMP January 2000 875 8. Message Formats 877 This section describes message formats used by BGMP. 879 Messages are sent over a reliable transport protocol connection. A 880 message is processed only after it is entirely received. The maximum 881 message size is 4096 octets. All implementations are required to 882 support this maximum message size. 884 All fields labelled "Reserved" below must be transmitted as 0, and 885 ignored upon receipt. 887 8.1. Message Header Format 889 Each message has a fixed-size (4-byte) header. There may or may not 890 be a data portion following the header, depending on the message 891 type. The layout of these fields is shown below: 893 0 1 2 3 894 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 895 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 896 | Length | Type | Reserved | 897 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 899 Length: 900 This 2-octet unsigned integer indicates the total length of the 901 message, including the header, in octets. Thus, e.g., it allows 902 one to locate in the transport-level stream the start of the next 903 message. The value of the Length field must always be at least 4 904 and no greater than 4096, and may be further constrained, depending 905 on the message type. No "padding" of extra data after the message 906 is allowed, so the Length field must have the smallest value 907 required given the rest of the message. 909 Type: 910 This 1-octet unsigned integer indicates the type code of the 911 message. The following type codes are defined: 913 1 - OPEN 914 2 - UPDATE 915 3 - NOTIFICATION 916 4 - KEEPALIVE 918 Draft BGMP January 2000 920 8.2. OPEN Message Format 922 After a transport protocol connection is established, the first 923 message sent by each side is an OPEN message. If the OPEN message is 924 acceptable, a KEEPALIVE message confirming the OPEN is sent back. 925 Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION 926 messages may be exchanged. 928 In addition to the fixed-size BGMP header, the OPEN message contains 929 the following fields: 931 0 1 2 3 932 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 933 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 934 | Version | Reserved | Hold Time | 935 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 936 | BGMP Identifier | 937 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 938 | | 939 + (Optional Parameters) | 940 | | 941 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 943 Version: 944 This 1-octet unsigned integer indicates the protocol version number 945 of the message. The current BGMP version number is 1. 947 Hold Time: 948 This 2-octet unsigned integer indicates the number of seconds that 949 the sender proposes for the value of the Hold Timer. Upon receipt 950 of an OPEN message, a BGMP speaker MUST calculate the value of the 951 Hold Timer by using the smaller of its configured Hold Time and the 952 Hold Time received in the OPEN message. The Hold Time MUST be 953 either zero or at least three seconds. An implementation may 954 reject connections on the basis of the Hold Time. The calculated 955 value indicates the maximum number of seconds that may elapse 956 between the receipt of successive KEEPALIVE, and/or UPDATE messages 957 by the sender. 959 BGMP Identifier: 960 This 4-octet unsigned integer indicates the BGMP Identifier of the 961 sender. A given BGMP speaker sets the value of its BGMP Identifier 963 Draft BGMP January 2000 965 to a globally-unique value assigned to that BGMP speaker (e.g., an 966 IPv4 address). The value of the BGMP Identifier is determined on 967 startup and is the same for every BGMP session opened. 969 Optional Parameters: 970 This field may contain a list of optional parameters, where each 971 parameter is encoded as a triplet. The combined length of all optional 973 parameters can be derived from the Length field in the message 974 header. 976 0 1 977 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 978 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 979 | Parm. Type | Parm. Length | Parameter Value (variable) 980 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 982 Parameter Type is a one octet field that unambiguously identifies 983 individual parameters. Parameter Length is a one octet field that 984 contains the length of the Parameter Value field in octets. 985 Parameter Value is a variable length field that is interpreted 986 according to the value of the Parameter Type field. 988 This document defines the following Optional Parameters: 990 a) Authentication Information (Parameter Type 1): 991 This optional parameter may be used to authenticate a BGMP peer. 992 The Parameter Value field contains a 1-octet Authentication Code 993 followed by a variable length Authentication Data. 995 0 1 2 3 4 5 6 7 8 996 +-+-+-+-+-+-+-+-+ 997 | Auth. Code | 998 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 999 | | 1000 | Authentication Data | 1001 | | 1002 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1004 Authentication Code: 1006 This 1-octet unsigned integer indicates the authentication 1007 mechanism being used. Whenever an authentication mechanism is 1009 Draft BGMP January 2000 1011 specified for use within BGMP, three things must be included in 1012 the specification: 1014 - the value of the Authentication Code which indicates use of the 1015 mechanism, - the form and meaning of the Authentication Data, and 1016 - the algorithm for computing values of Marker fields. 1018 Note that a separate authentication mechanism may be used in 1019 establishing the transport level connection. 1021 Authentication Data: 1023 The form and meaning of this field is a variable-length field 1024 depend on the Authentication Code. 1026 The minimum length of the OPEN message is 12 octets (including 1027 message header). 1029 b) Capability Information (Parameter Type 2): 1030 This is an Optional Parameter that is used by a BGMP-speaker to 1031 convey to its peer the list of capabilities supported by the 1032 speaker. The parameter contains one or more triples , where each triple is 1034 encoded as shown below: 1035 +------------------------------+ 1036 | Capability Code (1 octet) | 1037 +------------------------------+ 1038 | Capability Length (1 octet) | 1039 +------------------------------+ 1040 | Capability Value (variable) | 1041 +------------------------------+ 1042 Capability Code: 1044 Capability Code is a one octet field that unambiguously identifies 1045 individual capabilities. 1047 Capability Length: 1049 Capability Length is a one octet field that contains the length of 1050 the Capability Value field in octets. 1052 Capability Value: 1054 Capability Value is a variable length field that is interpreted 1056 Draft BGMP January 2000 1058 according to the value of the Capability Code field. 1060 A particular capability, as identified by its Capability Code, may 1061 occur more than once within the Optional Parameter. 1063 This document reserves Capability Codes 128-255 for vendor-specific 1064 applications. 1066 This document reserves value 0. 1068 Capability Codes (other than those reserved for vendor specific use) 1069 are assigned only by the IETF consensus process and IESG approval. 1071 8.3. UPDATE Message Format 1073 UPDATE messages are used to transfer Join/Prune/FwdrPref information 1074 between BGMP peers. The UPDATE message always includes the fixed- 1075 size BGMP header, and one or more attributes as described below. 1077 The message format below allows compact encoding of (*,G) Joins and 1078 Prunes, while allowing the flexibility needed to do other updates 1079 such as (S,G) Joins and Prunes towards sources as well as on the 1080 shared tree. In the discussion below, an Encoded-Address-Prefix is 1081 of the form: 1082 0 1 2 3 1083 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1084 +-+-+-+-+-+-+-+-+ 1085 |EnTyp| AddrFam | 1086 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1087 | Address (variable length) | 1088 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1089 | Mask (variable length) | 1090 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1092 EnTyp: 1093 0 - All 1's Mask. The Mask field is 0 bytes long. 1094 1 - Mask length included. The Mask field is 4 bytes long, and 1095 contains the mask length, in bits. 1096 2 - Full Mask included. The Mask field is the same length 1097 as the Address field, and contains the full bitmask. 1099 AddrFam: 1100 The IANA-assigned address family number of the encoded prefix. 1102 Draft BGMP January 2000 1104 These include (among others): 1106 Number Description 1107 ------ ----------- 1108 1 IP (IP version 4) 1109 2 IPv6 (IP version 6) 1111 Address: 1112 The address associated with the given prefix to be encoded. The 1113 length is determined based on the Address Family. 1115 Mask: 1116 The mask associated with the given prefix. The format (or absence) 1117 of this field is determined by the EnTyp field. 1119 Each attribute is of the form: 1121 0 1 2 3 1122 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1123 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1124 | Length | Type | Data ... 1125 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1126 All attributes are 4-byte aligned. 1128 Length: 1129 The Length is the length of the entire attribute, including the 1130 length, type, and data fields. If other attributes are nested 1131 within the data field, the length includes the size of all such 1132 nested attributes. 1134 Type: 1136 Types 128-255 are reserved for "optional" attributes. If a 1137 required attribute is unrecognized, a NOTIFICATION will be sent and 1138 the connection will be closed. Unrecognized optional attributes 1139 are simply ignored. 1141 0 - JOIN 1142 1 - PRUNE 1143 2 - GROUP 1144 3 - SOURCE 1145 4 - FWDR_PREF 1147 Draft BGMP January 2000 1149 a) JOIN (Type Code 0) 1151 The JOIN attribute indicates that all GROUP or SOURCE options 1152 nested immediately within the JOIN option should be joined. 1154 0 1 2 3 1155 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1156 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1157 | Length | Type=0 | Reserved | 1158 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1159 | Nested Attributes ... 1160 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1161 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1162 within a JOIN attribute. 1164 b) PRUNE (Type Code 1) 1166 The PRUNE attribute indicates that all GROUP or SOURCE attributes 1167 nested immediately within the PRUNE attribute should be pruned. 1169 0 1 2 3 1170 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1171 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1172 | Length | Type=1 | Reserved | 1173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1174 | Nested Attributes ... 1175 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1176 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1177 within a PRUNE attribute. 1179 c) GROUP (Type Code 2) 1181 The GROUP attribute identifies a given group-prefix. In addition, 1182 any attributes nested immediately within the GROUP attribute also 1183 apply to the given group-prefix. 1185 0 1 2 3 1186 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1187 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1188 | Length | Type=2 | | 1189 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1190 | | 1191 | Encoded-Address-Prefix | 1192 | | 1193 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1195 Draft BGMP January 2000 1197 | Nested Attributes (optional) ... 1198 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1199 Encoded-Address-Prefix 1200 The multicast group prefix to be joined to 1201 pruned, 1202 in the format described above. 1203 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1204 be 1205 immediately nested within a GROUP attribute. 1207 d) SOURCE (Type Code 3): 1209 The SOURCE attribute identifies a given source-prefix. In 1210 addition, any attributes nested immediately within the SOURCE 1211 attribute also apply to the given source-prefix. 1213 The SOURCE attribute has the following format: 1215 0 1 2 3 1216 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1217 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1218 | Length | Type=2 | | 1219 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1220 | | 1221 | Encoded-Address-Prefix | 1222 | | 1223 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1224 | Nested Attributes (optional) ... 1225 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1226 Encoded-Address-Prefix 1227 The Source-prefix in the format described 1228 above. 1229 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1230 be 1231 immediately nested within a SOURCE attribute. 1233 e) FWDR_PREF (Type Code 4) 1235 The FWDR_PREF attribute provides a forwarder preference value for 1236 all GROUP or SOURCE attributes nested immediately within the 1237 FWDR_PREF attribute. It is used by a BGMP speaker to inform other 1238 BGMP speakers of the originating speaker's degree of preference for 1239 a given group or source prefix. Usage of this attribute is 1240 described in 5.5. 1242 Draft BGMP January 2000 1244 0 1 2 3 1245 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1246 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1247 | Length | Type=1 | Reserved | 1248 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1249 | Preference Value | 1250 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1251 | Nested Attributes ... 1252 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1253 Preference Value A 32-bit non-negative integer. 1254 Nested Attributes No JOIN, PRUNE, or FWDR_PREF attributes may be 1255 immediately nested within a FWDR_PREF 1256 attribute. 1258 8.4. Encoding examples 1260 Below are enumerated examples of how various updates are built using 1261 nested attributes, where A ( B ) denotes that attribute B is nested 1262 within attribute A. 1263 (*,G-prefix) Join: JOIN ( GROUP ) 1264 (*,G-prefix) Prune: PRUNE ( GROUP ) 1265 (S,G) Join towards S : GROUP ( JOIN ( SOURCE ) ) 1266 (S,G) Join cancelling prune towards G: GROUP ( JOIN ( SOURCE ) ) 1267 (S,G) Prune towards S: GROUP ( PRUNE ( SOURCE ) ) 1268 (S,G) Prune towards G: GROUP ( PRUNE ( SOURCE ) ) 1269 Switch from (*,G) to (S,G): PRUNE ( GROUP ( JOIN ( SOURCE ) ) ) 1270 Switch from (S,G) to (*,G): JOIN ( GROUP ) 1271 Initial (*,G) Join with S pruned: JOIN ( GROUP ( PRUNE ( SOURCE ) ) ) 1272 Forwarder preference announcement for G-prefix: FWDR_PREF ( GROUP ) 1273 Forwarder preference announcement for S-prefix: FWDR_PREF ( SOURCE ) 1275 8.5. KEEPALIVE Message Format 1277 BGMP does not use any transport protocol-based keep-alive mechanism 1278 to determine if peers are reachable. Instead, KEEPALIVE messages are 1279 exchanged between peers often enough as not to cause the Hold Timer 1280 to expire. A reasonable maximum time between the last KEEPALIVE or 1281 UPDATE message sent, and the time at which a KEEPALIVE message is 1282 sent, would be one third of the Hold Time interval. KEEPALIVE 1283 messages MUST NOT be sent more frequently than one per second. An 1284 implementation MAY adjust the rate at which it sends KEEPALIVE 1285 messages as a function of the Hold Time interval. 1287 Draft BGMP January 2000 1289 If the negotiated Hold Time interval is zero, then periodic KEEPALIVE 1290 messages MUST NOT be sent. 1292 A KEEPALIVE message consists of only a message header, and has a 1293 length of 4 octets. 1295 8.6. NOTIFICATION Message Format 1297 A NOTIFICATION message is sent when an error condition is detected. 1298 The BGMP connection is closed immediately after sending it. 1300 In addition to the fixed-size BGMP header, the NOTIFICATION message 1301 contains the following fields: 1302 0 1 2 3 1303 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1304 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1305 | Error code | Error subcode | Data | 1306 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1307 | | 1308 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1310 Error Code: 1312 This 1-octet unsigned integer indicates the type of 1313 NOTIFICATION. The following Error Codes have been defined: 1315 Error Code Symbolic Name Reference 1317 1 Message Header Error Section 9.1 1319 2 OPEN Message Error Section 9.2 1321 3 UPDATE Message Error Section 9.3 1323 4 Hold Timer Expired Section 9.5 1325 5 Finite State Machine Error Section 9.6 1327 6 Cease Section 9.7 1329 Error subcode: 1331 This 1-octet unsigned integer provides more specific 1332 information about the nature of the reported error. Each 1334 Draft BGMP January 2000 1336 Error 1337 Code may have one or more Error Subcodes associated with it. 1338 If no appropriate Error Subcode is defined, then a zero 1339 (Unspecific) value is used for the Error Subcode field. 1341 Message Header Error subcodes: 1343 2 - Bad Message Length. 1344 3 - Bad Message Type. 1346 OPEN Message Error subcodes: 1348 1 - Unsupported Version Number 1349 4 - Unsupported Optional Parameter 1350 5 - Authentication Failure 1351 6 - Unacceptable Hold Time 1352 7 - Unsupported Capability 1354 UPDATE Message Error subcodes: 1356 1 - Malformed Attribute List 1357 2 - Unrecognized Well-known Attribute 1358 5 - Attribute Length Error 1359 10 - Invalid Prefix Field 1360 Data: 1361 This variable-length field is used to diagnose the reason for the 1362 NOTIFICATION. The contents of the Data field depend upon the 1363 Error Code and Error Subcode. See Section 9 below for more 1364 details. 1366 Note that the length of the Data field can be determined from the 1367 message Length field by the formula: 1369 Message Length = 6 + Data Length 1371 The minimum length of the NOTIFICATION message is 6 octets 1372 (including message header). 1374 9. BGMP Error Handling 1376 This section describes actions to be taken when errors are detected 1377 while processing BGMP messages. BGMP Error Handling is similar to 1378 that of BGP [BGP]. 1380 Draft BGMP January 2000 1382 When any of the conditions described here are detected, a 1383 NOTIFICATION message with the indicated Error Code, Error Subcode, 1384 and Data fields is sent, and the BGMP connection is closed. If no 1385 Error Subcode is specified, then a zero must be used. 1387 The phrase "the BGMP connection is closed" means that the transport 1388 protocol connection has been closed and that all resources for that 1389 BGMP connection have been deallocated. The remote peer is removed 1390 from the target list of all tree state entries. 1392 Unless specified explicitly, the Data field of the NOTIFICATION 1393 message that is sent to indicate an error is empty. 1395 9.1. Message Header error handling 1397 All errors detected while processing the Message Header are indicated 1398 by sending the NOTIFICATION message with Error Code Message Header 1399 Error. The Error Subcode elaborates on the specific nature of the 1400 error. 1402 If the Length field of the message header is less than 4 or greater 1403 than 4096, or if the Length field of an OPEN message is less than 1404 the minimum length of the OPEN message, or if the Length field of an 1405 UPDATE message is less than the minimum length of the UPDATE message, 1406 or if the Length field of a KEEPALIVE message is not equal to 4, then 1407 the Error Subcode is set to Bad Message Length. The Data field 1408 contains the erroneous Length field. 1410 If the Type field of the message header is not recognized, then the 1411 Error Subcode is set to Bad Message Type. The Data field contains 1412 the erroneous Type field. 1414 9.2. OPEN message error handling 1416 All errors detected while processing the OPEN message are indicated 1417 by sending the NOTIFICATION message with Error Code OPEN Message 1418 Error. The Error Subcode elaborates on the specific nature of the 1419 error. 1421 If the version number contained in the Version field of the received 1422 OPEN message is not supported, then the Error Subcode is set to 1423 Unsupported Version Number. The Data field is a 2-octet unsigned 1424 integer, which indicates the largest locally supported version number 1426 Draft BGMP January 2000 1428 less than the version the remote BGMP peer bid (as indicated in the 1429 received OPEN message). 1431 If the Hold Time field of the OPEN message is unacceptable, then the 1432 Error Subcode MUST be set to Unacceptable Hold Time. An 1433 implementation MUST reject Hold Time values of one or two seconds. 1434 An implementation MAY reject any proposed Hold Time. An 1435 implementation which accepts a Hold Time MUST use the negotiated 1436 value for the Hold Time. 1438 If one of the Optional Parameters in the OPEN message is not 1439 recognized, then the Error Subcode is set to Unsupported Optional 1440 Parameters. 1442 If the OPEN message carries Authentication Information (as an 1443 Optional Parameter), then the corresponding authentication procedure 1444 is invoked. If the authentication procedure (based on Authentication 1445 Code and Authentication Data) fails, then the Error Subcode is set to 1446 Authentication Failure. 1448 If the OPEN message indicates that the peer does not support a 1449 capability which the receiver requires, the receiver may send a 1450 NOTIFICATION message to the peer, and terminate peering. The Error 1451 Subcode in the message is set to Unsupported Capability. The Data 1452 field in the NOTIFICATION message lists the set of capabilities that 1453 cause the speaker to send the message. Each such capability is 1454 encoded the same way as it was encoded in the received OPEN message. 1456 9.3. UPDATE message error handling 1458 All errors detected while processing the UPDATE message are indicated 1459 by sending the NOTIFICATION message with Error Code UPDATE Message 1460 Error. The error subcode elaborates on the specific nature of the 1461 error. 1463 If any recognized attribute has Attribute Length that conflicts with 1464 the expected length (based on the attribute type code), then the 1465 Error Subcode is set to Attribute Length Error. The Data field 1466 contains the erroneous attribute (type, length and value). 1468 If the Encoded-Address-Prefix field in some attribute is 1470 Draft BGMP January 2000 1472 syntactically incorrect, then the Error Subcode is set to Invalid 1473 Prefix Field. 1475 If any other is encountered when processing attributes (such as 1476 invalid nestings), then the Error Subcode is set to Malformed 1477 Attribute List, and the problematic attribute is included in the data 1478 field. 1480 9.4. NOTIFICATION message error handling 1482 If a peer sends a NOTIFICATION message, and there is an error in that 1483 message, there is unfortunately no means of reporting this error via 1484 a subsequent NOTIFICATION message. Any such error, such as an 1485 unrecognized Error Code or Error Subcode, should be noticed, logged 1486 locally, and brought to the attention of the administration of the 1487 peer. The means to do this, however, lies outside the scope of this 1488 document. 1490 9.5. Hold Timer Expired error handling 1492 If a system does not receive successive KEEPALIVE and/or UPDATE 1493 and/or NOTIFICATION messages within the period specified in the Hold 1494 Time field of the OPEN message, then the NOTIFICATION message with 1495 Hold Timer Expired Error Code must be sent and the BGMP connection 1496 closed. 1498 9.6. Finite State Machine error handling 1500 Any error detected by the BGMP Finite State Machine (e.g., receipt of 1501 an unexpected event) is indicated by sending the NOTIFICATION message 1502 with Error Code Finite State Machine Error. 1504 9.7. Cease 1506 In absence of any fatal errors (that are indicated in this section), 1507 a BGMP peer may choose at any given time to close its BGMP connection 1508 by sending the NOTIFICATION message with Error Code Cease. However, 1509 the Cease NOTIFICATION message must not be used when a fatal error 1510 indicated by this section does exist. 1512 Draft BGMP January 2000 1514 9.8. Connection collision detection 1516 If a pair of BGMP speakers try simultaneously to establish a TCP 1517 connection to each other, then two parallel connections between this 1518 pair of speakers might well be formed. We refer to this situation as 1519 connection collision. Clearly, one of these connections must be 1520 closed. 1522 Based on the value of the BGMP Identifier a convention is established 1523 for detecting which BGMP connection is to be preserved when a 1524 collision does occur. The convention is to compare the BGMP 1525 Identifiers of the peers involved in the collision and to retain only 1526 the connection initiated by the BGMP speaker with the higher-valued 1527 BGMP Identifier. 1529 Upon receipt of an OPEN message, the local system must examine all of 1530 its connections that are in the OpenConfirm state. A BGMP speaker 1531 may also examine connections in an OpenSent state if it knows the 1532 BGMP Identifier of the peer by means outside of the protocol. If 1533 among these connections there is a connection to a remote BGMP 1534 speaker whose BGMP Identifier equals the one in the OPEN message, 1535 then the local system performs the following collision resolution 1536 procedure: 1538 1. The BGMP Identifier of the local system is compared to the BGMP 1539 Identifier of the remote system (as specified in the OPEN message). 1541 2. If the value of the local BGMP Identifier is less than the remote 1542 one, the local system closes BGMP connection that already exists (the 1543 one that is already in the OpenConfirm state), and accepts BGMP 1544 connection initiated by the remote system. 1546 3. Otherwise, the local system closes newly created BGMP connection 1547 (the one associated with the newly received OPEN message), and 1548 continues to use the existing one (the one that is already in the 1549 OpenConfirm state). 1551 Comparing BGMP Identifiers is done by treating them as (4-octet long) 1552 unsigned integers. 1554 A connection collision with an existing BGMP connection that is in 1555 Established states causes unconditional closing of the newly created 1556 connection. Note that a connection collision cannot be detected with 1557 connections that are in Idle, or Connect, or Active states. 1559 Draft BGMP January 2000 1561 Closing the BGMP connection (that results from the collision 1562 resolution procedure) is accomplished by sending the NOTIFICATION 1563 message with the Error Code Cease. 1565 10. BGMP Version Negotiation 1567 BGMP speakers may negotiate the version of the protocol by making 1568 multiple attempts to open a BGMP connection, starting with the 1569 highest version number each supports. If an open attempt fails with 1570 an Error Code OPEN Message Error, and an Error Subcode Unsupported 1571 Version Number, then the BGMP speaker has available the version 1572 number it tried, the version number its peer tried, the version 1573 number passed by its peer in the NOTIFICATION message, and the 1574 version numbers that it supports. If the two peers do support one or 1575 more common versions, then this will allow them to rapidly determine 1576 the highest common version. In order to support BGMP version 1577 negotiation, future versions of BGMP must retain the format of the 1578 OPEN and NOTIFICATION messages. 1580 10.1. BGMP Capability Negotiation 1582 When a BGMP speaker sends an OPEN message to its BGMP peer, the 1583 message may include an Optional Parameter, called Capabilities. The 1584 parameter lists the capabilities supported by the speaker. 1586 A BGMP speaker may use a particular capability when peering with 1587 another speaker only if both speakers support that capability. A 1588 BGMP speaker determines the capabilities supported by its peer by 1589 examining the list of capabilities present in the Capabilities 1590 Optional Parameter carried by the OPEN message that the speaker 1591 receives from the peer. 1593 11. BGMP Finite State machine 1595 This section specifies BGMP operation in terms of a Finite State 1596 Machine (FSM). Following is a brief summary and overview of BGMP 1597 operations by state as determined by this FSM. 1599 Initially BGMP is in the Idle state. 1601 Idle state: 1603 Draft BGMP January 2000 1605 In this state BGMP refuses all incoming BGMP connections. No 1606 resources are allocated to the peer. In response to the Start 1607 event (initiated by either system or operator) the local system 1608 initializes all BGMP resources, starts the ConnectRetry timer, 1609 initiates a transport connection to the other BGMP peer, while 1610 listening for a connection that may be initiated by the remote 1611 BGMP peer, and changes its state to Connect. The exact value of 1612 the ConnectRetry timer is a local matter, but should be 1613 sufficiently large to allow TCP initialization. 1615 If a BGMP speaker detects an error, it shuts down the connection 1616 and changes its state to Idle. Getting out of the Idle state 1617 requires generation of the Start event. If such an event is 1618 generated automatically, then persistent BGMP errors may result in 1619 persistent flapping of the speaker. To avoid such a condition it 1620 is recommended that Start events should not be generated 1621 immediately for a peer that was previously transitioned to Idle 1622 due to an error. For a peer that was previously transitioned to 1623 Idle due to an error, the time between consecutive generation of 1624 Start events, if such events are generated automatically, shall 1625 exponentially increase. The value of the initial timer shall be 60 1626 seconds. The time shall be doubled for each consecutive retry. 1628 Any other event received in the Idle state is ignored. 1630 Connect state: 1632 In this state BGMP is waiting for the transport protocol 1633 connection to be completed. 1635 If the transport protocol connection succeeds, the local system 1636 clears the ConnectRetry timer, completes initialization, sends an 1637 OPEN message to its peer, and changes its state to OpenSent. If 1638 the transport protocol connect fails (e.g., retransmission 1639 timeout), the local system restarts the ConnectRetry timer, 1640 continues to listen for a connection that may be initiated by the 1641 remote BGMP peer, and changes its state to Active state. 1643 In response to the ConnectRetry timer expired event, the local 1644 system restarts the ConnectRetry timer, initiates a transport 1645 connection to the other BGMP peer, continues to listen for a 1646 connection that may be initiated by the remote BGMP peer, and 1647 stays in the Connect state. 1649 The Start event is ignored in the Connect state. 1651 Draft BGMP January 2000 1653 In response to any other event (initiated by either system or 1654 operator), the local system releases all BGMP resources associated 1655 with this connection and changes its state to Idle. 1657 Active state: 1659 In this state BGMP is trying to acquire a peer by initiating a 1660 transport protocol connection. 1662 If the transport protocol connection succeeds, the local system 1663 clears the ConnectRetry timer, completes initialization, sends an 1664 OPEN message to its peer, sets its Hold Timer to a large value, 1665 and changes its state to OpenSent. A Hold Timer value of 4 1666 minutes is suggested. 1668 In response to the ConnectRetry timer expired event, the local 1669 system restarts the ConnectRetry timer, initiates a transport 1670 connection to other BGMP peer, continues to listen for a 1671 connection that may be initiated by the remote BGMP peer, and 1672 changes its state to Connect. 1674 If the local system detects that a remote peer is trying to 1675 establish BGMP connection to it, and the IP address of the remote 1676 peer is not an expected one, the local system restarts the 1677 ConnectRetry timer, rejects the attempted connection, continues to 1678 listen for a connection that may be initiated by the remote BGMP 1679 peer, and stays in the Active state. 1681 The Start event is ignored in the Active state. 1683 In response to any other event (initiated by either system or 1684 operator), the local system releases all BGMP resources associated 1685 with this connection and changes its state to Idle. 1687 OpenSent state: 1689 In this state BGMP waits for an OPEN message from its peer. When 1690 an OPEN message is received, all fields are checked for 1691 correctness. If the BGMP message header checking or OPEN message 1692 checking detects an error (see Section 6.2), or a connection 1693 collision (see Section 6.8) the local system sends a NOTIFICATION 1694 message and changes its state to Idle. 1696 If there are no errors in the OPEN message, BGMP sends a KEEPALIVE 1697 message and sets a KeepAlive timer. The Hold Timer, which was 1699 Draft BGMP January 2000 1701 originally set to a large value (see above), is replaced with the 1702 negotiated Hold Time value (see section 4.2). If the negotiated 1703 Hold Time value is zero, then the Hold Time timer and KeepAlive 1704 timers are not started. If the value of the Autonomous System 1705 field is the same as the local Autonomous System number, then the 1706 connection is an "internal" connection; otherwise, it is 1707 "external". Finally, the state is changed to OpenConfirm. 1709 If a disconnect notification is received from the underlying 1710 transport protocol, the local system closes the BGMP connection, 1711 restarts the ConnectRetry timer, while continue listening for 1712 connection that may be initiated by the remote BGMP peer, and goes 1713 into the Active state. 1715 If the Hold Timer expires, the local system sends NOTIFICATION 1716 message with error code Hold Timer Expired and changes its state 1717 to Idle. 1719 In response to the Stop event (initiated by either system or 1720 operator) the local system sends NOTIFICATION message with Error 1721 Code Cease and changes its state to Idle. 1723 The Start event is ignored in the OpenSent state. 1725 In response to any other event the local system sends NOTIFICATION 1726 message with Error Code Finite State Machine Error and changes its 1727 state to Idle. 1729 Whenever BGMP changes its state from OpenSent to Idle, it closes 1730 the BGMP (and transport-level) connection and releases all 1731 resources associated with that connection. 1733 OpenConfirm state: 1735 In this state BGMP waits for a KEEPALIVE or NOTIFICATION message. 1737 If the local system receives a KEEPALIVE message, it changes its 1738 state to Established. 1740 If the Hold Timer expires before a KEEPALIVE message is received, 1741 the local system sends NOTIFICATION message with error code Hold 1742 Timer Expired and changes its state to Idle. 1744 If the local system receives a NOTIFICATION message, it changes 1745 its state to Idle. 1747 Draft BGMP January 2000 1749 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1750 message and restarts its KeepAlive timer. 1752 If a disconnect notification is received from the underlying 1753 transport protocol, the local system changes its state to Idle. 1755 In response to the Stop event (initiated by either system or 1756 operator) the local system sends NOTIFICATION message with Error 1757 Code Cease and changes its state to Idle. 1759 The Start event is ignored in the OpenConfirm state. 1761 In response to any other event the local system sends NOTIFICATION 1762 message with Error Code Finite State Machine Error and changes its 1763 state to Idle. 1765 Whenever BGMP changes its state from OpenConfirm to Idle, it 1766 closes the BGMP (and transport-level) connection and releases all 1767 resources associated with that connection. 1769 Established state: 1771 In the Established state BGMP can exchange UPDATE, NOTIFICATION, 1772 and KEEPALIVE messages with its peer. 1774 If the local system receives an UPDATE or KEEPALIVE message, it 1775 restarts its Hold Timer, if the negotiated Hold Time value is non- 1776 zero. 1778 If the local system receives a NOTIFICATION message, it changes 1779 its state to Idle. 1781 If the local system receives an UPDATE message and the UPDATE 1782 message error handling procedure (see Section 6.3) detects an 1783 error, the local system sends a NOTIFICATION message and changes 1784 its state to Idle. 1786 If a disconnect notification is received from the underlying 1787 transport protocol, the local system changes its state to Idle. 1789 If the Hold Timer expires, the local system sends a NOTIFICATION 1790 message with Error Code Hold Timer Expired and changes its state 1791 to Idle. 1793 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1795 Draft BGMP January 2000 1797 message and restarts its KeepAlive timer. 1799 Each time the local system sends a KEEPALIVE or UPDATE message, it 1800 restarts its KeepAlive timer, unless the negotiated Hold Time 1801 value is zero. 1803 In response to the Stop event (initiated by either system or 1804 operator), the local system sends a NOTIFICATION message with 1805 Error Code Cease and changes its state to Idle. 1807 The Start event is ignored in the Established state. 1809 In response to any other event, the local system sends 1810 NOTIFICATION message with Error Code Finite State Machine Error 1811 and changes its state to Idle. 1813 Whenever BGMP changes its state from Established to Idle, it 1814 closes the BGMP (and transport-level) connection, releases all 1815 resources associated with that connection, and deletes all routes 1816 derived from that connection. 1818 12. Security Considerations 1820 Security issues are not discussed in this memo. 1822 13. Authors' Addresses 1824 Dave Thaler 1825 Department of Electrical Engineering and Computer Science 1826 Microsoft 1827 One Microsoft Way 1828 Redmond, WA 98052 1829 EMail: dthaler@microsoft.com 1831 Deborah Estrin 1832 Computer Science Dept./ISI 1833 University of Southern California 1834 Los Angeles, CA 90089 1835 EMail: estrin@usc.edu 1837 David Meyer 1838 Cisco Systems 1840 Draft BGMP January 2000 1842 San Jose, CA 1843 EMail: dmm@cisco.com 1845 14. References 1847 [BGP] 1848 Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1849 1771, March 1995. 1851 [MBGP] 1852 Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol 1853 Extensions for BGP-4", RFC 2283, February 1998. 1855 [CBT] 1856 Ballardie, A. J., "Core Based Trees (CBT) Multicast: Architectural 1857 Overview and Specification", University College London, November 1858 1994. 1860 [CBTDM] 1861 Ballardie, A., "Core Based Tree (CBT) Multicast Border Router 1862 Specification" draft-ietf-idmr-cbt-br-spec-00.txt, October 1997. 1864 [DVMRP] 1865 Pusateri, T., "Distance Vector Multicast Routing Protocol", draft- 1866 ietf-idmr-dvmrp-v3-05.txt, October 1997. 1868 [DWR] 1869 Fenner, W., "Domain-Wide Reports", Work in progress. 1871 [INTEROP] 1872 Thaler, D., "Interoperability Rules for Multicast Routing 1873 Protocols", draft-thaler-multicast-interop-01.txt, March 1997. 1875 [IPv6MAA] 1876 R. Hinden, S. Deering, "IPv6 Multicast Address Assignments", draft- 1877 ietf-ipngwg-multicast-assgn-04.txt, July 1997. 1879 [ISSUES] 1880 Meyer, D., "Some Issues for an Inter-domain Multicast Routing 1881 Protocol", draft-ietf-mboned-imrp-some-issues-02.txt, June 1997. 1883 [MASC] 1884 Estrin, D., Handley, M, and D. Thaler, "Multicast-Address-Set 1885 advertisement and Claim mechanism", Work in Progress, June 1997. 1887 Draft BGMP January 2000 1889 [MOSPF] 1890 Moy, J., "Multicast Extensions to OSPF", RFC 1584, Proteon, March 1891 1994. 1893 [PIMDM] 1894 Estrin, et al., "Protocol Independent Multicast-Dense Mode (PIM- 1895 DM): Protocol Specification", draft-ietf-idmr-pim-dm-spec-05.txt, 1896 May 1997. 1898 [PIMSM] 1899 Estrin, et al., "Protocol Independent Multicast-Sparse Mode (PIM- 1900 SM): Protocol Specification", RFC 2117, June 1997. 1902 [REFLECT] 1903 Bates, T., and R. Chandra, "BGP Route Reflection: An alternative to 1904 full mesh IBGP", RFC 1966, June 1996. 1906 [RFC1700] 1907 S. J. Reynolds, J. Postel, "ASSIGNED NUMBERS", RFC 1700, October 1908 1994. 1910 [RFC1771] 1911 Y. Rekhter, T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1771, 1912 March 1995. 1914 [RFC2119] 1915 S. Bradner, "Key words for use in RFCs to Indicate Requirement 1916 Levels", BCP 14, RFC 2119, March 1997. 1918 15. Full Copyright Statement 1920 Copyright (C) The Internet Society (2000). All Rights Reserved. 1922 This document and translations of it may be copied and furnished to 1923 others, and derivative works that comment on or otherwise explain it or 1924 assist in its implementation may be prepared, copied, published and 1925 distributed, in whole or in part, without restriction of any kind, 1926 provided that the above copyright notice and this paragraph are included 1927 on all such copies and derivative works. However, this document itself 1928 may not be modified in any way, such as by removing the copyright notice 1929 or references to the Internet Society or other Internet organizations, 1930 except as needed for the purpose of developing Internet standards in 1931 which case the procedures for copyrights defined in the Internet 1933 Draft BGMP January 2000 1935 languages other than English. 1937 The limited permissions granted above are perpetual and will not be 1938 revoked by the Internet Society or its successors or assigns. 1940 This document and the information contained herein is provided on an "AS 1941 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 1942 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 1943 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 1944 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 1945 FITNESS FOR A PARTICULAR PURPOSE. 1947 Table of Contents 1949 1 Acknowledgements ................................................ 2 1950 2 Purpose ......................................................... 2 1951 3 Terminology ..................................................... 3 1952 4 Protocol Overview ............................................... 5 1953 4.1 Design Rationale .............................................. 6 1954 5 Protocol Details ................................................ 8 1955 5.1 Interaction with the EGP ...................................... 8 1956 5.2 Multicast Data Packet Processing .............................. 9 1957 5.3 BGMP processing of Join and Prune messages and notifications 1958 .............................................................. 10 1959 5.3.1 Receiving Joins ............................................. 10 1960 5.3.2 Receiving Prune Notifications ............................... 11 1961 5.3.3 Receiving Route Change Notifications ........................ 11 1962 5.4 Interaction with M-IGP components ............................. 12 1963 5.4.1 Interaction with DVMRP and PIM-DM ........................... 13 1964 5.4.2 Interaction with PIM-SM ..................................... 14 1965 5.4.3 Interaction with CBT ........................................ 15 1966 5.4.4 Interaction with MOSPF ...................................... 16 1967 5.5 Operation over Multi-access Networks .......................... 16 1968 6 Interaction with address allocation ............................. 17 1969 6.1 Requirements for BGMP components .............................. 17 1970 7 Transition Strategy ............................................. 17 1971 7.1 Preventing transit through the MBone stub ..................... 19 1972 8 Message Formats ................................................. 20 1973 8.1 Message Header Format ......................................... 20 1974 8.2 OPEN Message Format ........................................... 21 1975 8.3 UPDATE Message Format ......................................... 24 1976 8.4 Encoding examples ............................................. 28 1977 8.5 KEEPALIVE Message Format ...................................... 28 1978 8.6 NOTIFICATION Message Format ................................... 29 1980 Draft BGMP January 2000 1982 9 BGMP Error Handling ............................................. 30 1983 9.1 Message Header error handling ................................. 31 1984 9.2 OPEN message error handling ................................... 31 1985 9.3 UPDATE message error handling ................................. 32 1986 9.4 NOTIFICATION message error handling ........................... 33 1987 9.5 Hold Timer Expired error handling ............................. 33 1988 9.6 Finite State Machine error handling ........................... 33 1989 9.7 Cease ......................................................... 33 1990 9.8 Connection collision detection ................................ 34 1991 10 BGMP Version Negotiation ....................................... 35 1992 10.1 BGMP Capability Negotiation .................................. 35 1993 11 BGMP Finite State machine ...................................... 35 1994 12 Security Considerations ........................................ 40 1995 13 Authors' Addresses ............................................. 40 1996 14 References ..................................................... 41 1997 15 Full Copyright Statement ....................................... 42